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ABSTRACT 


This study is an experimental inquiry into the 
perceptual dimensions involved in the Pet ion of a 
selected set of English consonarntal phonemes. The 
methodology and substantive findings of previous attempts to 
determine the perceptual reality of phonological features, 
or to provide a direct characterization of perceptual 
features without recourse to a specific formal linguistic 
framework, are reviewed. Particular attention is given to 
multidimensional scaling (MDS) as a methodological tool and 


to the results of MDS of minimal speech sounds. 


Four experiments were performed with related sets of 
English CV syllables, employing MDS and factor analysis of 
direct and indirectly elicited judgements of perceptual 
similarity. Analysis of the data yielded two major 
dimensions, each with a ready auditory/acoustic 
Characterization. Various phonological feature systems and 
measurable acoustic properties of the experimental stimuli 
were used as predictors of both the raw perceptual proximity 
matrix and the set of MDS derived inter-stimulus distances. 
It was found that the MDS configuration could be 
reconstructed with a fairly high degree of accurracy 
(Pearson's Product Moment Correlation = .81) from _ the 


physical duration of the consonantal portion of the stimulus 
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and a simple bandfilter function, labelled the Resonance- 
Hiss dimension. Theoretical implications of the findings for 


a model of perception at the phonemic level are discussed. 
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CHAPT ERT 


INTRODUCTION 


Although it is just one facet of the duquien of speech 
perception, phonemic recognition denotes a perceptual 
capability that is arguably fundamental for any general 
model of how the listener extracts linguistic messages from 
the highly variable Signal of human speech. It is 
fundamental in the sense that higher order syntacto-semantic 
elements of the message are predicated upon the extraction 
of a certain (unspecified but necessary) amount of 
phonological information from the signal. Some qualification 


may be’ in order here. 


in ait probability speech perception is a multilevel 
process. Superimposed upon the basic information flow from 
concrete auditory perceptual targets to abstract conceptual 
elements of the message, there appears to be a substantial 
amount of independent parallel processing of the signal at 
different levels of analysis. Decoding at the phonological 
level is no doubt partially directed by the listener's 
expectations or anticipations about the content of the 
message formed on the basis of ongoing semantic and 
syntactic processing. Phonological decoding is also most 
likely facilitated by the listener's knowledge of segmental 
and sequential redundancies that form the characteristic 


sound pattern of his native language. (The efficacy of this 
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latter class of factors is attested by the listener's 
characteristic insensitivity to phonetic variation from the 


phonotactic constraints of his language.) 


Despite the obvious fact that no particular level of 
Signal analysis is functionally independent of any other in 
the perceptual process, (and indeed the isolation of any 
particular level of analysis seems to involve a degree of 
imposition of an artificial conceptual framework on the 
phenomenon under study), a case can be made for considering 
phonemic recognition as a Significant and isolatable level 
in speech perception. Native listeners are remarkably 
consistent in their ability to extract sequences of phonemic 
targets from the quasi-continuous and highly variable speech 
Signal. As Smith . (1973) and others have observed, this 
achievement is of comparable complexity to the attainment of 
object constancy in vision, despite the instability of the 


yattern of retinal stimulati é 
patter £ retinal stimuiation 


Several sources of complexity in the mapping between 
the acoustic elena and the stable phonemic target are 
identifiable: structural differences between the vocal 
tracts of different speakers; idiolectical variation arising 
from idiosyncracies in a speaker's manner of speech 
production; dialectical variation; coarticuiation effects 
and other variations in the phonetic realization ofa 
phonemic target meeoetated with the linguistic environment 


in which the target is embedded; the background listening 
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conditions themselves. 


Although something of a linguistic trusim, the notion 
that the inventory of phonemic targets embodies the set of 
minimal meaning differentiating sound contrasts for a 
particular language, is also an important consideration for 
any model of speech perception. An efficient lexical storage 
code (and some kind of lexical storage system is an 
obviously necessary component of any model of perception or 
producticn) must in some way utilize this set of contrasts. 
It is, of course, conceivable that each lexical item could 
be recognized or reproduced on the basis of feature 
specifications unique to that iten (the maximally 
inefficient option), but this would raise difficulties 
explaining how listeners can readily reproduce (through 
mimicry or orthographic transcription) phonemic sequences 
that, in all likelihood, they have never heard before. Also, 
as has often been remarked (e€.g., Schane, 1973) it would be 
avenscuat to account for speakers' intuitions about the 
identity or contrast of particular phonemic targets in 
sieresent linguistic divi thdnctts Peay the psychological 


reality of the phonemic level of representation is denied. 


However, it has often been argued to the contrary that 
phonemic recognition is more correctly regarded as a 
cognitive rather than a perceptual skill and that it is 
derivatively based on one more strictly perceptual unit of 


signal analysis. Massaro (1972) presents a good deal of 
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evidence (not all of it sound) "in support of the 
interpretation... that the phoneme is not perceived 
directly...but is inferred from the identification of the 
syllabie or word." Massaro takes the classical 
spectrographic evidence from synthetic stop consonants that 
Liberman et al, (1957,1967) used to support the parallel 
decoding of target phonemes tpon segments of the acoustic 
signal of the order cf the syllable as "... convincing 
support for eliminating the phoneme as the perceptual unit 
for processing speech." At the present time it is necessary 
to leave the question of the appropriate units of analysis 
open. Suffice to point out that what can be regarded as 
phonemic recognition can also be satisfactorally described 


in either syllabic, segmental, or subphonemic terms. 


Knowledge of the perceptual processes underlying 
phonemic recognition is at present very slight. Perhaps the 
most influential source of theoretical constructs in the 
past 20 years has been “ue Jakobson, Fant, and Hallie 


monograph Preliminaries to Speech Analysis (1951). 


Essentially, their model postulates a highly 
restricted, language universal, set of binary perceptual 
"features", each with a more or less clearly defined 
acoustic referent. Page? phonemes are presumably mutually 


discriminated and identified by the speech processor on the 
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basis of an array or pattern of “on-off" feature detector 
output states. The feature detectors are generally conceived 
to operate independently and in parallel upon the input 
Signal (or more precisly, upon an appropriately speaker and 
Speech rate normalized transform of the original signal). 
The authors draw attention to the fact that in the speech 
Signal there is really no simple linear ordering of acoustic 
cues corresponding to a feature specification of the 
sequence of phonemes given by a phonological representation 


of the message. 


A phonemic target is formally defined by an array of 
feature values. But the feature specification is not 
invariant over different realizations of the target in the 
speech signal. Phonological processes (such as assimilation 
and neutralization) are largely responsible for this 
variability. For correct phonemic recognition in many 
instances, the feature detector would have to take account 
of features of immediatly surrounding phonemic targets and 
"know" the relevant sequential constraints on feature 
combination for the language in question. This complicates 
but does not invalidate the simple parallel feature 


extraction model previously outlined. 


More serious consequences flow from the observation 
(implied if not explicitly stated in the Jakobson, Fant, and 
Halle account) that the perceptually relevant signal 


properties for recognizing a given feature are not 
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necessarily those acoustic properties associated with the 
definition of that feature. Aspiration, for example, is in 
all likelihood the most salient auditory cue for the Tense- 
Lax feature in English stops, word or syllable initially. 
However, in word or syllable final position the "redundant" 
feature of the length of the immediatly preceding vowel 
seems to be the salient cue (Denes, 1955; Peterson & 
Lehiste, 1960; Raphael, 1971). Apart from the unfortunate 
semantic consequence of requiring "redundant" or 
"predictable" phonetic features to serve as the basis for 
inferring the presence of otherwise imperceptable 
"distinctive" features, the motivation for postulating 
distinctive features in a perceptual model is considerably 
weakened. As Smith (1973) has commented: 

Superficially at least, the theoretical 

advantages of this (the distinctive 

feature) approach are enormous. Rather 

than store the vast range of possible 

realizations ...of the phoneme /p/ and 

discriminate such patterns from a large 

number of alternative patterns, the 

perceiver needs only to keep track of a 

small number cf features, each feature 

limited in the number of feature values 

itveanradopt)fip. 512). 
But at least some of the features in the Jakobson, Fant, and 
Halle system must be of comparable complexity with the 
phonemic targets themselves, in terms of their mapping onto 
perceptually salient characteristics of the auditory signal. 


If this is so, then the question that naturally arises is, 


what does a model of perception gain by postulating such 
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entities? Also, doubt is cast on the perceptual reality of 
the features themselves. (In what sense is the "feature" 
which distinguishes /fi/ and /I/ the same as that which 
differentiates /Pp/ fren? /b/icand t/sAC@ifrom 72/7?) “or 
independent reasons, one may wish to retain features such as 
"gravity" and "tenseness" in a performance model. However, 
5 i should be recognized that doing so will probably 
complicate rather than simplify the account of the 


perceptual process. 


The question of the reiationship between phonological 
theory and a theory of speech perception at the phonological 
level will be dealt with more fully below (Chapter II). 
Suffice for present purposes to note the change in 
connotation that the term “distinctive feature" is apt to 
undergo as the topic shifts from phonological theory to 
Speech recognition. In the former context distinctive 
features are primarily classificatory devices, conceived by 
phonologists, for the purpose of grouping segments that 
undergo sinilar phonological processes in identical 
linguistic environments. necording to the celebrated 
“simplicity metric" of Halle (1964), that feature or feature 
system which most economically designates such "natural" cl 
asses of segments is most highly valued. In the context of a 
perceptual model however, a "distinctive feature" refers to 
some attribute (simple. or complex from the viewpoint of a 
physical description of the signal) that serves for the 


native iistener to differentiate between members of a set of 
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perceptual targets. it is a common but dangerous assumption 
to treat these two senses of the term "distinctive feature" 
as synonymous. Thus Schane (1973) suggests: "The more 
features operating to distinguish segments, the more 
different those segments are and the greater their 
peseent unt distance." The features to which Schane is 
referring are those of the M.I.T. School of generative 
phonology. While there does appear to be some positive 
relationship between most distinctive feature counts and the 
perceptual distance between ve target phonemes (Smith, 
1973; Singh, 1974), there are no good grounds for assuming 
that an optimal set of classificatory features for a 
generative phonology will optimally predict perceptual 


contrasts. 


Interest in this paper focuses upon developing a 
feature system that is optimal from the perceptual point of 
view. Such a system must be capable of representing the 
entire set of phonemic contrasts that the hearer is 
_ accustomed to making and which are characteristic of his 
native language. More than Figs, however, a perceptuaily 
based feature system should reflect the relative perceptual 
salience of different phonemic contrasts £or the 
phoneticaily untrained listener. It is intuitively evident 
that certain phonemically relevant sound contrasts are more 
easily detected by the auditory system than others. Such 
contrasts are probably acquired earlier in the course of 


language development and, it is reasonable to speculate, 
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will be utilized in a greater proportion of the world's 
languages. The isolation OF maximally perceptually 
contrastive phonemic classes - particularly if such classes 
should prove universally marked in human languages and 
ontogenically prior to other, more language specific, 
phonemic contrasts - would obviously be very important for 
the development of a phonological theory that seeks to 
attain a level of "explanatory" adequacy. Moreover, it is 
difficult to see how, within the confines of current 
methodology in phonology, such perceptual classes or sound 
groupings could be identified, if for no other reason than 
production and perception considerations are thoroughly 


confounded. 


A perceptually based feature system should also specify 
what auditory properties of the signal are used in phonemic 
recognition, and thereby indicate something of the nature of 
the sound-analysing devices employed by the human perceptual 
‘apparatus. It is commonly agreed (and also implied in all of 
the proposed distinctive feature systems) that the nature of 
the auditory stimulus for phonemic recognition is 
multidimensional. But there is no substantial agreement 
about the number or the nature of these supposed perceptual 
dimensions. There is, correspondingly, little agreement 
about the general properties of the sound-analysing devices 


required for phonemic recognition. 


athixed es te iy ie i 
ebbevs qiorse eitoosieatt” ae on Ee 
eetjefte: dren +2 pioe toate ae ale 
nde ‘eeguppckt. atawe ' fie “pelbaa \iiten Hie ay 
ees ao pags * ana Wagar 6F pe a 
769 sya tognl Yev a viesoaaaee Binow 7 ian iaty 
ct eheye Eee geoeee teoigshon sty 5 ‘den’ 
ni 62 4 20S ERO help hrotetague” hie 


- f 4 nell . iba ; 
, seasthe ie ULE Fo sat aid ee" eed Sie ot 28k 


a a Le <4 wt Ae i ¥ 

Gn its Fe daetels raurespied fous, “Poataneng ag 5 
at ieee, Aero. ba 10 mibgaxyaletangte 
iprager? |. ets =p ae RaRSS 208: pea! 


= re 


vildeaele “2. feel ae bo a 


tease, wax Sea, tenep ees lend eed / 
at vate, abasuply: pes) : soi aS (eee letiar ae | 
ag eieesen Eatyys tie” dh “Tek aged ne" teaokauon sneetgm - 
dabeqenios bgsbaqee 44 atud? 45 Sian: jee ee teeta aes sxtde.' 7 
tienderde swt rt41 ) sefpethnegmardaes eo pea? eno be nie 
sian el ere ate 30: Splizeeorg S18) a>? See 


eo dhteten 763 Sealapeht 


16 


Those associated with one highiy influential trend in 
speech perception research, centred around the pioneering 
work of Liberman, Cooper, and their associates at Haskins 
Laboratories, have argued for the unique character of speech 
recognition as distinct from other forms of auditory 
recognition. They have postuiated signal analysing devices 
(conceived of as specific linguistic feature detectors) that 
uniquely function for the recognition of speech. Studies 
under the dichotic listening paradigm (Kimura, 1961; 
Studdert-Kennedy & Shankweiler, 1970; Darwin, 1971; Day & 
Bartlett, 1971; - to mention cnly a notable few), and a 
couple of recent experiments with Auditory Evoked Responses 
(Wood, Goff, & Day, 1971; Wood, 1973), ‘it is clained, 
support a neurophysiological locus for these perceptual de 
vices in the left temporal hemisphere. It has also been 
suggested, on the basis of Voice Onset Time studies (Eimas 
et al., 1971) that these devices are “prewired" as part of 
the genetically transmitted neurological substrate of human 
linguistic capabilities. Such a notion is compatable with 
Jakobson'ts idea that there exists a relatively small set of 
features for constructing phonemic oppositions, a subset of 


which is realized in any given language. 


we may contrast this view with the one (perhaps best 
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typified by Lane (1965)) _ which argues that the stimulus 
dimensions employed, and the processes underlying phonemic 
recognition, are not uniquely linguistic (part of some 
innate human "faculte de langage") but ultimately derive 
from general properties of the auditory-perceptual system in 
interaction with an equally general process of 
discrimination learning. There may be selective "sharpening" 
or "suppression" of many potentially audible characteristics 
of the signal when it is processed as speech, but (the 
argument runs) there is no need to invoke special linguistic 
features and their corresponding detection devices, to 


explain phonemic recognition. 


These two conflicting theoretical dispositions are 
useful for characterising “the state of the art" in speech 
perception research and for pointing up major specific 
problem areas. Individual writers can be roughly located on 
this theoretical continuum, ranging from strong "nativist" 


to strong "empiricist" positions. 


The nativist position is characterised by strong 
assumptions about the nature of the user's language code and 
strong emphasis upon the differences between speech 
recognition and other modes of auditory perception. 
Proponents of this view have a tendency to accept the rubric 
of modern phonological theory (in particular, the generative 
phonology of Sqkoneen Me aey Chomsky, Postal, et al.) for 


what it purports to be - a competence model: or an abstract 
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representation of what the speaker-hearer "knows" about the 


sound pattern of his language. 


The “empiricist" position makes minimal assumptions 
about the nature of the code and emphasises the continuity 
between speech perception (at the level of phonemic 
recognition) and what is known about the process of auditory 
perception in general. Proponents of this view will tend to 
reject linguistically derived constructs such as "phonemes" 
or "distinctive features" as components of a perceptual 
model - much less accept that the brain has evolved special 
perceptual decoding mechanisms to detect such entities in 
the speech signal. The major dimensions involved in speech 
perception should be predictable from the acoustic 
properties of the signal, the response characteristics of 
the auditory system, and the language experience of the 


listener. 


Present knowledge is quite inadequate for choosing 
between these two broadly sketched alternatives. With 
respect to the "empiricist" view, it may be observed that 
there is insufficient understanding of the human auditory 
system to specify what major auditory parameters would be 
used for differentiating signals with roughly the same 
source characteristics as human speech. The great 
developments in instrumentation for acoustic research over 
the last 25 years have, in this respect, merely served to 


focus more sharply the problem of choosing a perceptually 
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revealing physical representation of the signal. 


With respect to the "nativist" view and the problem of 
characterising the language user's code - it is clear that 
this is very much an open question. The grammarian's formal 
constraints on rule writing (insofar as such agreed upon 
principles exist in this controversial area) have no clear 
relevance or plausability for a model of what the user 
"knows" (consciously or tacitly) about his language, as 
revealed in language use. For example, many postulated 
phonological processes reveal morphological relationships 
which permit a good deal of simplification and _ size 
reduction of the lexicon. But the psychological implication, 
that the resulting savings of lexical storage "“space™" are 
relevant for the language user, is quite unfounded (see 


Derwing, 1973, p.154, note 2). 


Obviously not all the theoretically interesting 
questions which are raised by the two conflicing 
dispositions outlined above will be answerable in the 
forseeable future. However, what - with an admittedly uneasy 
choice of terminology - may be labelled the “nativist- 
empiricist" controversy in speech perception, points to two 
potentially fruitful hypotheses about the nature of the 


perceptual dimensions involved in phonemic recognition: 


5 8 The perceptual dimensions are unique to sounds 
of speech and presuppose a special "speech mode" of 


perceptual processing. Such dimensions may be either 
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language specific (reflecting the particular system of 
phonological contrasts of a particular language) or 
universal ( reflecting the set of "possible" phonemic 


oppositions). 


H2'2 The perceptual dimensions used rn 
phonemic recognition are not speech or language 
specific but general auditory properties ultimately 
attributable to the way the auditory apparatus 
responds to compiex environmental sounds with acoustic 
properties roughly Similar to the human speech signal. 

(A necessary qualification on this hypothesis should 
probably be made to the effect that any "general 
auditory parameters" will be modified or "tuned" by 
specific learning experience and the demands imposed 
by the sound contrasts of a given language.) 
The methodological question which is prior to these 
hypotheses, of how such perceptual dimensions are to be 
isolated and described, is taken up in the following 


section. 


lethodology 


At the risk of oversimplification, two major research 
strategies to study the problem of phonemic recognition are 
discernable. These two strategies are respectively 
associated with two types of difficulty confronting the 


researcher - the twin horns of the speech perception dilemna 
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- indeterminancy with respect to the description of the 
Signal and indeterminancy with respect to the description of 


the code. 


The first research strategy, which forms the dominant 
paradigm in contemporary phonetics, directly addresses the 
first horn of the dilemma. By controlied manipulation of 
physical parameters of the Signal the investigator attempts 
to isolate those features of the Signal that are relevant 
£6r sone specific phonemic distinction or class of 
distinctions. The power of this paradigm has only been 
realised with the pevalepment of speech synthesis (begining 
With the early Pattern Playback method) and computer based 
techniques of Signal Manipulation and reconstruction 
(Roszypal, 1974). However, this approach has real 
limitations from the viewpoint of the second horn of the 
dilemma. The perceptual importance of ae linguistic 
distinction under investigation must be taken as given. The 
now extensive body of research on the topic of voice onset 
time (VOT) is a convenient example here. VOT is just one of 
a number, and probably not che most salient, of acoustic 
cues for the recognition of voicing, and the perceptual 
status of the voice feature is itself problematical. What, 
for example, is its relative prominance compared with other 
phonemic distinctions such as stridency? What justification 
is there, from the perceptual point of view, for regarding 
voicing as a homogeneous feature, applicable right across 


the phonemic inventory? 
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In short, the dominant paradigm of experimental 
phonetics yields a great deal of precise information about 
properties of the Signal associated with specific 
phonological distinctions deemed relevant to phonemic 
recognition. But it is difficult to decide from all this 
information, what properties of the signal are of greater or 
lesser importance for the perceptual system, and how the 
collective findings of a large number of contextually 
restricted experiments can be integrated into a general 


model of phonemic recognition. 


The second paradigm (which can be ascribed to the 
psychologists, having generously yielded the former to _ the 
phoneticians) is focused less upon the signal than the 
linguistic code terete. or more precisely, upon the problem 
of determining an adequate representation of speech sounds 
as perceptual end-products of the decoding process. It 
begins with a consideration of the total set of phonemic 
distinctions the listener is normally capabie of making. 
This is embodied in the full phonemic inventory of the 
listener's native language and the researcher who opts for 
this paradigm attempts to map the perceptual relationships 
that obtain amongst the set of target phonemes. Such a 
mapping will yield the perceptually most salient contrasts 
amongst the set of target phonemes, from which it should be 
possible to draw made Ness about what features of the 


signal are most important for phonemic recognition and 
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perhaps too, broad suggestions as to how the decoding takes 
place. Hence, although the two paradigms outlined above are 
mutually complementary, it seems that the second has a 


certain logical priority. 


The first step in mapping the perceptual relationships 
among a set of target phonemes is to obtain a matrix of 
proximities - a convenient numerical representation of the 
degree of perceptual relatedness between any given target 
and all other targets in the set. A proximity matrix is a 
set of experimentally generated estimates which serves as 
the data base for, or the empirical constraint upon, the 
investigator's attempt to model the output of the perceptual 
process. Phonemic targets that share a common perceptual 
basis will have high valued entries in the proximity matrix. 
The most perceptually contrastive will have the lowest 
entries. The matrix is symetrical about the main diagonal 
where the values (which represent the degree of relatedness 


between a target and itself) are maximal. 


A variety of experimental methods have been developed 
for generating perceptual proximity matrices. These 
differences may arise from the scaiing procedures or, more 
importantly, from the kind of data base that the 
experimenter chooses to adopt. This methodological variation 
greatly complicates any review of the reported research, 
because different methods of generating a proximity matrix 


may have different implications for the study of speech 
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perception. For example, Wickelgren's (1966) oft quoted 
study of phonemic substitutions is of much greater relevance 
for processes involving the retention of phonological 
information (say, for purposes of lexical remedeveds than 
for first or second order pase cacqiead decoding. Chapter III 
of this paper discusses the various methods of generating 
proximity matrices that have been proposed, compares their 


differential theoretical implications and reviews the major 


findings reported to date, - 


Obtaining a proximity matrix is only the initial step 
in mapping perceptual relationships among a set of 
perceptual targets. A variety of powerful data reduction 
techniques have been developed for pose te rieiug the 
presumed latent structure in a proximity matrix. Each of 
these data reduction techniques imposes a given mathematical 
model upon the data. The question of the appropriatness of 
particular formal structures for a perceptual model 
therefore becomes a critical guestion, but one very 


difficult to answer. 


Usually the latent structure of a proximity matrix is 
characterised upon the assumption that proximities may be 
interpreted as distances (or, more precisly, as some 
function of the unknown distances) in a space of certain 
specified metric properties and dimensionality. A family of 
data reduction procedures known as “multidimensional 


scaling" (MDS) has been developed over the last 20 years 
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(primarily in the last 10) for obtaining such spatial 
representations of the proximity matrix. Also relevant are 
the related methods of "factor analysis." Chapter IV deals 
with the basic procedures and foundational assumptions of 


MDS as it applies to the study of speech perception. 


Succeeding chapters describe a series of experiments 
with selected sets of English consonantal phonemes embedded 
in a CV frame, employing multidimensional scaling and factor 
analysis to subjects! direct and indirect judgements of 
perceptual similarity. The aim of all of these experiments 
was to attempt to determine the number and nature of the 
major perceptual dimensions native listeners employ in 


recognizing the consonantal phonemes of English. 


Ancillary to this major theme are questions concerning: 

(1) the relative perceptual prominance of various 
phonologically derived "distinctive features" and the 
perceptual adequacy of different distinctive feature 
systems. 

(2) the perceptual and acoustic correlates of 
interpretive factors derived from the MDS studies. 
Multiple Regression . Analysis was used as the najor 
analytical tool here. Both the derived interpoint distances 
and the raw proximity scores were used as dependent 
variables to which linear, least squares, fittings were 
obtained for selected sets of phonologically, perceptually, 


and acoustically based independent variables. 


et ry 4 als a | 


- Eee 28 Anye iddpdtetde: | 
is eq ewer Bala a 
elie A VE gersqens ar ach A 


oT ei: 


Im ateitesdegs leneke 


a ow 
joi tad roy Saige ae} 
SaeeT we Eta 1x2 39 aatiee cs ae ae 
‘i 


sebietiga dohoadltG iedabsngine waeies & sn a 
crak tay ake LaapSeamnntt EA tame ea . 


P Pte 4 = 
v2 staan POEIRA, Die Sap seh re 


¥ 


SELETGES . SFGT* “Ie ctw ¥ ‘eka Son . 
ed> 40 Stb2kb Baz ii a. on 


tiv QGGe” aetaee ss 


:BALGISI0 95 4 
suO2 2 19: 
35.2 ae 
nla 
29° th Se diertos 3) te F008 base  ‘febsuatrey s¢2 aS) i 
_eaihli te Ane eR. (BOFe Goeizes ax6T 36% avbteamreree 
iatea! in ae hadi, ie aber ribeemoee Siyssee 
sean pre aad too sn fons seein red i‘, 
te apnea es pita beets” Bacie ws 2 on 
ie ele: Beers bees Sue su bcntesie 8 


: = ' . 
: ne 3 , ra : 
' i oe y ; — : A of “A a 
« » ee] ‘ 7 : >. 7 ; aL ~ ve 
ws 4 ‘7 i 4 wh 6 
aa 7 , i ion . 
‘ 2 7 - iw as 


Vg q 
as - vc J es 2 


20 


CHAPTER If 


PHONOLOGICAL THEORY IN RELATION TO A PERCEPTUAL MODEL 


In the previous chapter certain difficulties were noted 
when the Jakobson, Fant, and Halle distinctive feature 
theory is treated as a schematic outline for a model of 
phonemic recognition. This raised the wider question of the 
relevance of phonological theory for understanding how 
ec anags extract linguistic information from the speech 
Signal. This question is also urged by the claims generative 
phonologists make for their grammars as descriptions of 


"linguistic competence." 


It is generally conceded that "the fundamental unit of 
generative phonology is the distinctive feature... [Harms, 
1968]." Therefore, the strategy adopted in this chapter for 
Clarifying this question of the relevance of phonological 
theory for the study of speech perception, will be to note 
what functions distinctive features serve and to inquire 
what constraints operate upon their postulation. On _ the 
basis of this analysis it should be possible to see how well 
the linguistic functions of distinctive features may 
subserve necessary or optionally useful functions of a 


speech recognition device. 
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Features 


See SS SS 


Distinctive features fulfill three basic functions in 
phonological theory. Firstly, they are used to _ specify 
phonetic properties, so that the set of feature values 
assigned to a segment can provide an overall index of 
phonetic similarity with any other segment and so that 
sounds can be compared with respect to particular phonetic 
qualities. Secondly, features have traditionally been 
employed to specify phonemic oppositions - those sound 
contrasts of a language that are of special linguistic 
Significance over other perceivable auditory contrasts in 
the signal. Thirdly, distinctive features are used for 
grouping together segments that undergo the same 
phonological processes. These may be respectivly referred to 
as the phonetic, the phonemic, and the phonological (or 


classificatory) functions of distinctive features. 


These functions are to some extent open to various 
interpretations, and different theorists do not necessarily 
ascribe to them the same relative importance. Phonetic 
relationships can be described with either production or 
perception primarily in mind. Perhaps the most obvious 
contrast between the Jakobsonian and the Chomsky and Halle 
feature systems involves just this orientation. For 
Jakobson, distinctive features were undoubtedly thought of 


as perceptual entities: 
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distinctive features... that to a large 
extent determines our perception of the 
speech sounds [Jakobson, Fant, & Halle, 
1957, pe 210%. 


eee in its sound shape any language 
operates with discrete and polar 
distinctive features, and this polarity 
enables us to detect any feature 
functioning ceteris paribus 

4957 py pak 11s 


For the generative phonologists, on the other hand, phonetic 
features are defined mainly in articulatory terms with no 
explicit claims regarding their perceptual reality: 

The total set of features is identical 

with the set of properties that can in 

principle be controled in speech; they 

represent the phonetic capabilities of 

Man, and we would assume are therefore 

the same for all languages [Chomsky & 

Halle, 1968, p. 259}. 

eee the phonetic matrix is then 

descriptive of the fact that the human 

vocal system is composed of a number of 

subparts capable of independent action 

and of different types of action... 

[ Postal, 1968,p.59]. 
Any clain for the perceptual reality of these kinds of 
features is evidently extrinsic to the features themselves 
and must derive from a special hypothesis about the nature 
of speech perception, perhaps along the lines of the "motor 


theory" of perception of Liberman et ail. (1967) or the 


“analysis by synthesis" model of Halle and Stevens (1967). 


Although he thought of features primarily -in auditory- 
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perceptual terms, Jakobson was not concerned with a detailed 
auditory phonetic description of speech sounds. Distinctive 
features provided a vehicle for representing the set of 
phonemic oppositions of a language and Jakobson believed 
that it is these and not the "redundant" auditory-phonetic 
features of speech sounds to which the native speaker 


responds when he listens to speech. 


In itself, the phonemic function imposes no significant 
constraints upon the choice of features. Jakobson 
constrained feature choice by requiring that they be 
universally applicable to all languages and small in number. 
He found he was able to considerably reduce the number of 
necessary features by applying the same set of features to 
capture both vowel and consonantal phonemic oppositions, 

For example, it can be shown that the 
relation of the close to open vowels, on 
the one hand, and that of the labials 
and dental consonants to consonants 
produced against the hard and _ soft 
palate, on the other, are all 
implementations of a single opposition: 
diffuse vs. compact... In their turn the 
relations between the back and front 
vowels, and between the labial and 
dental consonants pertain to a common 
opposition: grave vs. acute {Jakobson, 
Fant, & Halle, 1951, p< 7. 

The rather bold innovation of cross-classifying both 
consonants and vowels with the same set of features has been 
largely retained by generative phonology - though not for 


the purpose of minimizing the number of features, but to 


provide a notation to capture the assimilatory nature of 
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certain consonant-vowel interactions (such as palatalization 
or rounding). The phonological function of distinctive 
features was not explicitly formulated by Jakobson, but has 
been the major criterion in the later revision of his scheme 
by generative phonologists. Whether the features that cross- 
Classify (as distinct from those that merely separate) 
consonants and vowels, make sense as perceptual dimensions, 
is seriously doubtful. Jakobson, Fant, and Halle (1951) 
discuss the acoustic and perceptual characteristics of the 
two major cross-classificatory features separately for 
consonants and voweis. This alone seems to indicate that 
there are problems with gravity and compactness as singular 
and coherent perceptual dimensions that apply right across 


the consonant-vowel domain. 


Jakobson also economized on features by exploiting the 
fact that certain kinds of phonemic opposition never appear 
to co-occur in any one language. This enables superficially 
distinct oppositions to be ccllapsed into a single binary 
feature without restricting the range of phonemic 
oppositions that can be represented by the system. For 
example, the flat vs. plain feature is used for both 
phonemic labialization and pharyngealization (among others). 
Although from an articulatory standpoint it is difficult to 
imagine a more contrastive pair of articulatory gestures, 
Jakobson claims that their perceptual effects are so similar 
that no language could tolerate their coexistence as 


independent phonemic oppositions: 
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The fact that peoples who have no 
phar yngealized consonants in their 
mother tongue, as, for instance, the 
Bantus and the Uzbeks, substitute 


labialized articulations for the 
corresponding pharyngealized consonants 
of Arabic words, illustrates the 
perceptual Similarity of 


pharyngealization and lip rounding 
P1957, pp. 031}. 

This example rather nicely points to the difference in 
orientation between the Jakobsonian and the Chomsky and 
Halle feature systems which was. noted above. If features 
were in any sense to represent ideal targets or instructions 
for the articulatory apparatus, the collapsing of 
pharyngealization and Jlabialization into a single flat 
vs. plain feature dimension would be clearly inappropriate. 
(Brain to Mouth: "Either purse your lips or constrict your 


phanynx™. --cf., fe Cawley ,.1972). 


klthough it dees not contribute to minimizing the 
number of features, but rather works to the contrary, 


Jakobson regarded the binary nature of distinctive features 


as fundamental: 


In the special case of speech... a set 
of binary selections is inherent in the 
communication process itself as a 
constraint imposed by the code on the 
participants in the speech event... The 
dichotomous scale aS the pivotal 
principle of the linguistic structure. 
The code imposes it upon the sound 
f.195745° Da 91}. 
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On the basis of this and quotations made earlier one 
might be inclined to regard the binary principle as 
axiomatic and not as an empirical hypothesis. However, 
attempts are made to support the binary nature of 
distinctive features with psychological evidence (Jakobson 


and Halle, 1956), so this ambiguity, at least, is resolved. 


Jakobson and Halle's first argument is that the 
auditory system is maximally efficient when operating with 
binary feature dimensions: 

Recent experiments have confirmed that 

multidimensional auditory displays are 

most easily Learned and perceived when 

‘binary codedt [Jakobson & Halle, 1956, 

p2i 474. 
In fact, there is a dearth of relevant information on this 
admittedly important subject. The experiments to which 
Jakobson and Halle refer (Pollack & Ficks, 1954) are 
described by the authors themselves as “at best, only 
exploratory," and do not provide the right kind of 
information “that©cis» needed to test the! binary coding 
hypothesis. Pollack and Ficks created multidimensional 
auditory stimuli of 6 and 8 dimensions using tone peeps and 
noise bursts. (The dimensions were created out of of such 
variables as the frequency of the tone, the rate of 
alternation of tone and noise, and the overall duration of 
the stimulus.) Two, three, or five steps were created for 
each stimulus dimension, with the two steps in the binary 


condition defined by the extreme steps on-the five step 
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scale and the three steps on the ternary condition by steps 
1, 3, and 5 on the five step scale. Ali dimensions were 
binary for the eight-dimensional stimuli. Subjects worked 
with either binary, trinary, or quinary multidimensional 
stimuli. Their task was to identify the state of each 
stimulus dimension separately on any given stimulus 
presentation. Judgements were not made under time pressure. 
Neglecting details of the quantitative analysis, the 
authors' major results are of interest: 

The most striking finding is that the 

amount of information transmitted with 

these multidimensional auditory displays 

greatly exceeds that obtained under 

comparable conditions with 

unidimensional displays... 

The second finding is that there is 
proportionately little improvement in 
information transmission as each 
dimension is subdivided more 
finely... the more proficient subjects 
were able to take better advantage of 
the finer subdivision oft each 
dimension... 

The third finding is that a further 
increase in the number of stimulus 
dimensions produced a stiil further gain 
in information transmission [1954, p. 

1564. 
The second and third findings are hardly surprising 
considering: (a) that the most contrastive steps were 
selected as the values for the binary condition, (b) the 
nature of the subject's task, which was to make a series of 


successive judgements about the stimulus without time 


pressure. The information transmission measure thus 
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overlooked "the nost important Single variable in 
information transmission: time." Consideration (a) shows 
that the design of the experiment clearly favours the binary 
displays. Consideration (b) shows that it faroude increasing 
the simulus dimensionality. Arguably, what the experimenters 
should have done was to study the effect of stimulus 
oe eee ee eae and dimensional subdivision upon the amount 
of information that can be transmitted per unit time. The 
authors of the experiment were aware of the limited 
generalizability of their results, but apparently Jakobson 


and Halle were not. 


Jakobson and Halle's second piece of "evidence" I will 
have to simply quote because it is quite obscure with 
respect to both intrinsic meaning and empirical implication: 

Second, the phonemic code is acquired in 
the earliest years of childhood and, as 
psychology reveals, in a child's mind 
the pair is anterior to isolated objects 
(Wallon H., 1945). The binary opposition 
is a child's first logical operation. 
Both opposites arise simultaneously and 
force the infant to choose one and 


supress the other of two alternatives 
[ Jakobson & Halle, 1956,p.47). 


Their third and final piece of evidence, involving an 
early experiment with vowel mixing (Huber,1934), is too 
restrictive in scope to be worth considering in detail. 
Besides, as Jakobson and Halle admit, the results are 
equivocal. They disconfirm the binary hypothesis on the 


compact-diffuse dimension with English front vowels and (in 
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Jakobson and Halle's eyes, but not this author's), confirm 


it with the vowels /u/, and /i/ "on the tonality axis." 


Perhaps the source of the binary principle lies in the 
traditional methodology of phonemic analysis which involves 
the examination of sets of minimal pair oppositions in an 
attempt to discern a number of elementary phonetic features 
that, by their presence or absence, can serve to subclassify 
phones into phonemic classes. The minimal pairs test may be 
thought to imply a binary opposition. However, this is of no 
theoretical significance, being merely a consequence of the 


analytical technique. 


It is not disputed here that, in terms of subjective 
perceptual impression, speech sounds have a quantal 
character. Moreover, there is a considerable body of 
evidence based upon natural and synthetic speech (Liberman 
et ale, tS6l= -T963: 1970; Lane, 1965; 1967; for review 
articles) that for certain kinds of phonemic discrimination 
(most notably, involving the stop consonants) perception is 
categorical in the sense that the listenerts discrimination 
sensitivity is not constant along the relevant phonetic 
dimension, but peaks at phoneme boundary points. However, to 
admit the quantal nature of the subjective percept and the 
categoricai response of the human auditory system in some 
areas of phonemic recognition, in no way entaiis acceptance 


of the binary principle espoused most strongly by Jakobson. 


Lt is sometimes argued that binary features are 
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necessary for phonological theory. Ladefoged (1971) and 
Conteras (1969) present some convincing arguments against 
this position and point out that the Chomsky and Halle 
(1968) phonology is not without coal ivaained feature 
specifications (the stress rules). Even if phonological 
theory required the binary principle, it would not follow 
that the perceptual dimensions involved in phonemic 


recognition must be binary. 


To summarize the argument thus far: The phonemic 
function of distinctive features is of primary importance in 
the dakobsonian system. This function is relevant in that 
what people hear when they listen to speech is powerfully 
influenced by the phonemic pattern of their native language. 
There are three major constraints on feature postulation for 
Jakobson. Features must be binary, universai, and minimal in 
number. The binary principle is not well founded on either 
perceptual or linguistic grounds. The requirements of 
universality and minimal number are synergistically related. 
The fewer the number of features, the more powerful the 
universal linguistic implications of the theory appear to 
be, and the combination of the two constraints does in fact 
constrain a distinctive feature representation in a way that 
a language-specific or a numerically unlimited set of 
features would not. However, it is by no means clear that 


these constraints are appropriate, unless one is prepared to 
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entertain extrinsic assumptions of the kind mentioned in 
Chapter I - that the human auditory apparatus is genetically 
constituted to perceive speech sounds in a unique and rather 
specific manner. It is not clear that pes assumption is 


warranted or testable at the present time. 


One further point should be made in reference to the 
Jakobson, Fant, & Hallie (1951) system, though it applies to 
all current linguistic feature systems. For purposes of 
constructing a perceptual model, the requirement that every 
feature have a readily specifiable and measurable acoustic 
correlate is both too weak and too strong. It is too weak in 
the sense that there are an indefinite number of readily 
measurable acoustic attributes of auditory stimuli (that may 
be employed for stimulus subcategorization), which may or 
Bay not be reliably detectable by the auditory apparatus. It 
is too strong in the sense that, what is a simple and 
readily detectable auditory parameter to some biological 
sound analyser, may be a highly complex integration of those 
parameters that the acoustic engineer is able to 
characteriize within the limits of his instrumental 
technology. Neurophysiologists have become quite conscious 
of this embarrasing fact in recent years (Whitfield and 


Evans, 1965; Worden & Galambos, 1972, in passim). 


Features in Generative Phonology 
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Generative phonology has given particular importance to 
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the phonological function of distinctive features. There are 
humerous illustrations available which show how the choice 
of features can be governed by phonologicai considerations. 
A notable source of such illustrations ree the proposals 
that have been offered regarding some perceived inadequacies 
of the Sound Pattern feature system. For example, several 


writers have pointed to the inadequacy that the Sound 


Pattern feature system "does not permit us to formally 


express the fact that lip based sounds (tanterior, -coronal) 
and round sounds (+round) form @ natural class [Ladefoged & 


Venneman, 1971, p. 14]. 


For example, the sound change /w/~+> /v/ is a widespread 
diachronic phenomenon. The naturalness condition (Chomsky & 
Halle, 1968, p. 335) stipulates that sounds which are prone 
to undergo such phonological alternations be economically 
representable by the feature system. Conversely, "unnatural" 
(rarely or never alternating) phone classes ought to be much 
more difficult to express by the feature notation. 
Similarly, it is quite common to find non-labial consonants 
assimilate to the labial point of articulation in the 
environment of a rounded vowel or glide (see Campbell, 1974 
for several examples of "labial attraction" rules). Where 
this assimilation does not alter the primary point of 
articulation, but merely introduces a "secondary" 
articulatory overlay of lip rounding, the Chomsky and Halle 
feature system works. However, changes in primary point of 


articulation are not uncommon, and for these cases the 
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feature system fails to express the change economically, 
and, more importantly, does not capture its assimilatory 
character. The evident solution to this problem is to 
postulate a feature (labial) which Re ee bilabial and 
labiodental closure in consonants and lip rounding in vowels 


and glides. 


The question has been raised as to whether phonetic and 
phonological features are necessarily the same kind of 
entity. Ladefoged (1971a,b) has pointed out that among those 
features with the strongest phonological motivation are some 
that are not associated with any “single, measurable" 
acoustic or physiological property (such features as 
consonantality, labiality, or the stress feature). He 
proposes a categorical distinction between these features - 
referred to as "cover" features - and “primary" features 
that do have a measurable phonetic referent. 

Any empirical theory has to have a 
number of primitives which are definable 
in terms of concepts which belong 
outside the ‘theory. In the case of 
phonological theory, these are prime 


features which are definable in terms of 
acoustic or physiological properties of 


sounds... In addition there are 
phonological features that are not 
themselves prime features but 
disjunctions of values of prime 


features; ...they are cover terms for 
certain values of related prime features 
{ Ladefoged & Venneman, 1971, p. 13]. 


«ee the relationship between them [ prime 
and cover features] is of the form 
indicated by feature redundancy rules. 
The number of prime features must, as in 
any theory, be minimal; but the number 
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of derived features constructed fron 
them must be sufficient so that we can 
give explanatory formulations of 
linguistic phenomena [1971, p.23}. 

One possible formal objection to the use of cover 
features is that they appear to be insufficiently 
constrained by considerations that must necessarily be met 
if phonological rules are to have any explanatory power. In 
other words, the criteria for cover features do not seem to 
prevent the establishment of ad hoc feature classes that may 
be very convenient for the formulation of phonological 


rules, but may at the same time be quite "unnatural" from 


the standpoint of phonetic similarity. 


AS both -Fromkin (1976) and Ladefoged (1971) have 
argued, the Maturalness" of a particular feature or the 
explanatory basis for a given phonological rule cannot be 
established on formal grounds, but reduces to one of two 
classes of considerations: namely, considerations of 
perceptual similarity and contrast or considerations of 
production. For this reason, the phonetic basis of a 
distinttive feature system will be a mixture of auditory and 
articulatory considerations. "Some features wiil be more 
easily interpretable in one way, and others in the other [ 


Ladefoged, 1971, p. 7]. 


It would appear then that cover as well as prime 
features should be justifiable on grounds of either 


production or perception. But does this entail that they 
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must be reducible to "simple" scalar physical properties? It 
has already been argued that the acoustic correlates of an 
auditory-perceptual variable imay gohetnei thenssinple, «nor 
for technical reasons = accurately rR but 
nonetheless real. The same would appear to hold in the 
domain of production. The fact that articulatory based 
taxonomies £oxr phonetic description have existed for 
centuries and that various systems show a good deal of 
correspondence should not be allowed to obscure the fact 
that very 1tttle is known about the relevant control and 
feedback parameters involved in speech production. Should an 
articulatory fedbdce system which is optimal from the 
standpoint of a “universal phonetic theory" reflect the 
geometry of £he vocal tract - some spatial representation of 
a set of ideal articulatory target points? If so, then from 
what is known about somatotopic representation in the 
central nervous system (Mountcastle, 1970) one would be lead 
to suspect that the subjective geometry of the vocal tract 
is related to its physical proportions in some highly 
complex, non-linear fashion. Would it perhaps be better to 
base an articulatory feature system upon an analysis of 
synergistically operating groups of muscles? What role 


should be given to tactile as opposed to proprioceptive or 


auditory feedback? 


Ladefoged's "Cover" versus "prime" feature distinction 
may be regarded as one way of resolving a long-standing 


controversy in phonological theory ( Trubétzkoy, 1969), 


ss Be 
+1 ait iy bea toedg Ricks eet 


a ~ Gietpanen ahi: al 


ee. « Ser eet: ; 
a tae vs) oobta beyewh’. 


e , ‘ ; 4a B rane eae fe 


tad! .-e agowas 4700 savor ye ‘a en pare: “#oaaBe att “99 
eu kemalen an av stey alt Shed TROT me absyhe aie rt 

i aE 2 eu 1g KSSaue eet vesioenl sr eatin te | 
gos8.. Kenigro- aS Saxe, aaseye saree 

edt 4 30 ites %y» 7oaNZ opt aiay Maaewaget «te ate 
bog ve iaengiiee det inbeage “nea ¢ = Aigad bene oho. 30: Ute ae 
ans beans 4oe ¥¢ ta7aiipi aii fi mS fan te br a 
okt- vE |Gbmidgeee ease fatesAgos. _ pects oes ‘eo | 
lie +a pains eid hs ae a sé assy, | 4 \desexe apowsoy tages es a7 
fakes twas ade Seti vrs 00, wwe — a ed ‘soeqnnn 6 ‘G 
= a 92 nt beteter ab 
oF sah eT ok age tree bes “#peiteaon 9g Ho é 
tp aie inis nae moa ey > ee? Liorakut sae as on, : 
opus +009 iaksage ae fom ait yllentss terenyy ay 


 ghapie: SafOe - aa: bey 


} 
<4 


2 \ 4 
; 46 =F) rqeonitgote oF bastbaaoe 2 asvip ad bLuoda rh 
ar eR as de fa 
| ahha ee eet vroaibos be 
pa MARAE I Pepm 2 aN 
ee _» : ve 
eS, Peeves" ‘e/Bagotsd sr ; jegs 
het ee ie 4 ; 
“ej ” vee @R0 as Seh7e por’ od 1H oe 
[sy see 
™ te <a Boas 
- Spolosedy uh yetovoxtaga ; 


36 


namely, whether phonologists need or ought to be concerned 
with the phonetic basis of speech in arriving at 
phonologically well-motivated sound classes (feature 
systems). In terms of the concerns of this paper, the 
question may be turned around to ask, whether there is some 
plausible basis for the supposition that features posited on 
the basis of their phonological function have some relevance 
to a perceptual model. It has often been observed that not 
all the sound regularities described in phonological rules 
are readily explicable in terms of operational constraints 
on the perceptual or speech production systems. The appeal 
to "ease of articulation", for example, has most likely some 
validity for common assimilatory processes observed in many 
languages. On the other hand, many of the "phonotactic 
constraints" characteristic of a particular language appear 
to be quite arbitrary. These sequential constraints on sound 
combination introduce redundancies into the speech signal 
which could well be utilized by a perceptual mechanism 
having access to then through some internalized 
representation of the sound on ttern of the language. Part of 
the phonological function of distinctive features is to 
provide an economical means of distinguishing such classes 
of permissible from non-permissible sound sequences (Halle, 


1962; Chomsky & Halle, 1965). 


Insofar as distinctive features are necessary or useful 
for rule writing and insofar as the rules capture 


information which could be utilized by a perceptual device, 
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to enhance speed and reliability of operation, particularly 
under "difficult" listening conditions, an indirect and 


features as components of a speech perception model. 


Gonei der the following argument: Many of the sequential 
constraints on sound combination in English (or any other 
language) are contingent upon the presence of linguistic 
boundary markers of one kind or another: morphemic 
boundaries, word boundaries, phrase boundaries, etc. (for a 
detailed discussion of the current status of boundary 
symbols in phonology, see Stanley, 1974). Of particular 
interest, largely because of the prominant role that they 
have played in discussions of generative phonology, are the 
class of rules variously known as "morpheme structure" rules 
or conditions (Stanley, 1970), or as "lexical redundancy 
rules" (Chomsky & Halle, 1968). The major function of such 
rules in generative phonology is usually characterized as 
one of economizing on lexical representations, or its 
complement, of maximally exploiting regularities in the 
sound sequencing of the language. For example, very few of 
the possible pairwise combinations of consonantal phonemes 
constitute permissible morpheme-initial consonant clusters 
in English. Hence many of the phonological features in 
lexical items containing an initial consonant cluster are 
predictable by rule and need not be entered in the lexicon. 
The relevance of this kind of "economic" consideration for a 


performance model has already been questioned (Chapter I). 
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However, there is another way of conceiving the 
functional role of morpheme structure rules that seems to be 
more germane to the problem of speech recognition. The 
determination of morphological beundaties is one of the 
necessary analytical operations for any model of speech 
recognition that employs a word or morpheme lexicon. (Words 
and morphemes are not, of course, synonomous, but for 
purposes of the present discussion the distinction is not 
important.) Morphological boundaries must be at least 
tentatively determined in order to associate items in 
lexical storage with "portions" of the continuously changing 
incoming auditory signal. Just as boundary symbols are 
employed as part of the structural description of 
phonological rales to "predict" information about the 
phonological feature specification of lexical items in the 
language, so certain specific collocations of features may 
be used to establish linguistic boundary markers that might 
otherwise have no overt phonetic manifestation in the 


signal. 


However, phonotactic constraints may be represented 
quite adequately for the purposes of lexical economizing or 
linguistic boundary-marker assignment without any recourse 
to a subphonemic, distinctive feature system. They may 
simply be stated in terms of collocational restrictions on 
unanalysed segments (systematic phonemes) and certain 


linguistic boundaries. Conceivably, they may even be 
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statable in terms of a (phonological) syllabary. 


Therefore, the argument for the plausibility of 
phonologically motivated features from: the possible 
perceptual utility of the rules that they permit the 
phonologist to write, is not a compelling, or even a_ strong 
one. And, as the phonological motivation for a feature is 
the only criterion for its postuiation which is considered 
by the proposed "evaituation metric" of generative phonology, 
there is no reason to expect that the optimal feature set 
from the viewpoint of a generative phonology will also be 


optimal from a perceptual point of view. 
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CHAPTER III 


PERCEPTUAL DIMENSIONS IN PHONEMIC RECOGNITION 


Even a superficial euutactewiet the literature reveals 
that 2 broad range of methodologies and experimental 
rationales have been used in the attempt to isolate and 
describe the major perceptual factors involved in phonemic 
recognition. In the interests of a coherent presentation of 
the substantive findings it is moyenne to briefly indicate 
each of these methodologies, stating their possible 
interrelationships and respective limitations. The review of 
the published findings in this chapter will focus upon the 
work with consonantal sounds and emphasise the rationale for 
Peotee! taal a particular data base and method of data 
collection. Some of the most interesting work has been done 
with vowel perception (Polis, van der Kamp, & Plomp, 1969; 
Terbeek 6 Harshman, 1971; Harshman, 1971). This work is 
primarily of newhade itogvcat interest to the present study. 
It will therefore be mentioned in the context of the 
evaluation of Multidimensional Scaling (MDS) and related 


data reduction techniques in Chapter IV. 


Researchers differ considerably in their reliance upon 
phonological theory to provide the analytical framework for 
their studies. An important distinction can be drawn between 
those studies that rely upon some a priori feature scheme 


and those that derive features a posteriori from the data 


OF¢ 


“vag apes. a" te 


Seteacdg: OF scabies Sioiser Sei E 
Torte? Mesa. Sir eigS a: ag 2H ats / 


: ear | i. 
res jhat Sliesid. 29 ofa bS ee 3m oe 


s4e0¢ usenye -¢ Surtees 
e wii¥atr edt * Gd 2a Rae es . 


#2. doay ayes [eee | raidan aittaul atm 
7S a tata en ait sRIant 528 airs anatiney ; ; 


28)  %e bodsem “bae bahia Dell 


pea ‘yGOld a wt ‘he 


ah Agow S240 .c4fh POT omens it 
2 ; es ares 
eee tnage’ 242 of haem fae i. fubettta de tis salty 
eve. ARs a ae ids bi A 
it 20 ySeetion 247 a\> Lec all sietetsds Se fi 


bezeiet “hee (20h: Sta cmehienis te no kama 8 fe 
: 7 a - ae a 392g: He sient veumdnaete Lore ey 


% in oer ee 
q a fi @ 
a” Heys. oagagt Vi ans wa ren ii casei Barter coe a Vr 


=i eis adie ie feoksgtis cae aoag Sak We44 i srk of KGa Be 
: j byes lute Tres a 


Le 7 tT 3s 
peihuss soni, ; Ae a 


\ pdt seody baal 
nd 


41 


base. Most of the earlier studies relied upon an a priori 
feature system: 
oe. they. could explain perceptual 
patterns only in. terms of the 
predetermined attributes without having 
the option of systematically exploring 
the possibility of the perceiver's 
utilization of more appropriate 
attributes [Singh, 1974, p. 56]. 
The fundamental problen for researchers who derive 
perceptual features a posteriori from the data is how 
appropriate (well motivated) are the theoretical 


presuppositions upon which the data reduction procedure is 


based? 


Perceptual confusions have provided the most popular 
data base for isolating perceptual features in phonemic 
recognition. This approach rests on the assumption that 
sounds which are similarly valued on whatever perceptual 
attributes the listener uses in their recognition Or 
discrimination will (other things being equal) show a higher 
probability of mutual confusion - particularly under 
difficult listening conditions of one kind or another. A 
confusion matrix in which the off-diagonal elements contain 
the relative frequency of misidentifications of each 
perceptual target with every other in the set should 
therefore contain (though not necessarily in readily 
accessible form) information about the set of underlying 
perceptual features involved. Because of the high 


reliability of perception under normal listening conditions, 
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to generate analysable patterns of confusion scores in the 
off-diagonal elements of the matrix, it is usually necessary 
to subject the perceptual system to some kind of operational 
stress - via signal masking, distortion, or attenuation, or 
by arranging the task so that some performative limitation 
of the subject is exceeded to the point where he makes 
frequent errors. Miller and Nicely (1955), in their 
classical study of ponecenantal perception under white noise 
masking, and low and high-pass. filtering, were the first to 
explore this approach. Their extensive data base has been 
used by eaperone subsequent investigators (Wilson, 1963; 
Johnson, 1967; Cocoran et al., 1968; Wish, 1970; Hollaway, 


1971; Shepard, 1972; Smith, 1973). 


An attractive feature of confusion matrices obtained 
under conditions of signal masking is that the experimenter 
can be reasonabiy assured the data will be uncontaminated by 
extra-perceptual factors. However, when the subjects! 
performative capabilities are placed under stress - for 
example, by an interpolated memory task (Wickelgren, 1966) - 
the experimenter no longer has this assurance. This 
objection has also been raised (Shepard, 1972) against 
another common method of measuring perceptual proximity - 


subjective estimates of perceptual similarity. 


Either direct or indirect similarity scaling may be 
used to generate a perceptual proximity matrix. Direct 


estimates of perceptual Similarity may be obtained by a 
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variety of psychometric procedures. Subjects may be required 
to make pairwise ratings of the stimuli on a categorical or 
continuous scale of overall perceptual Similarity. 
Alternatively, their experimental task may be to simply 
choose between two alternative responses, the one which is 


"most like" a given stimulus as standard. 


A measure of perceptual similarity may also be 
generated indirectly from ratings on a number of semantic 
scales rather than a single eel Similarity scale. The set 
of phoneme targets may be rated by subjects with respect to 
a number of verbal-descriptive scales, thought to capture 
various perceptually relevant qualities of sounds of speech. 
The degree of correlation between the profiles of scores on 
the set Ben cenantse scales for any two phonemes may be taken 
as an index of the similarity of the two phonemes. Of 
course, the accurracy of any such indirect estimate of 
similarity is predicated upon an appropriate choice of 


descriptive scales and relative scale weightings. 


If the experiments are properly carried out, the 
results of direct and indirect similarity scaling should be 
highly correlated. Similarity scaling has the advantage over 
confusability indices of providing perceptual proximities 
under listening conditions that are free of gross signal 
distortion or some other abnormal, error-inducing factor. 
But, on the other hand, similarity scaling is based upon 


what is, for the listener, a rather artificial task. 
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Discrigpinatory reaction time (DRT) has recently been 
explored as a measure of perceptual proximity (Weiner §& 
Singh, 1978), based upon the assumption that closely related 
perceptual targets will require longer discrimination 


latencies. The technique looks promising. 


Perceptual Confusion Satrices 


Ziler and Nicely (1955) used an a priori feature 
system to evaluate the effects of 17 experimental conditions 
of white noise amasking and signal filtering upon the 
identification of 16 consonants embedded in a /Ca/ syllabic 


frame (see Tabie 3.1 for features, conditions, and stimuli). 
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DETAILS OF THE MILLER AND NICELY (1955) EXPERIMENT 
Paradiga: Confusion matrices collected under conditions of 
(a) white-noise masking; 6 S/N ratios ranging 
from -18dB to +12dB, fr. response 200-6500Hz 


{b) low-pass filtering; 6 conditions, 200-300Hz, 
200-4COHz, 200-600Hz, 200-120CHz, 200-50C0Hz 


(c) high-pass filtering; 6 conditions, 1-5kHz, 
2-SkHz, 2.5-SkHz, 3-5kHz, 4.5-5kHz 


Subjects: 5 female subjects who served as talkers and 
listeners 


Stimuli: fPetekele@eSeSededeTeVedeZezeMeD/ ina /Ca/ frame 
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the relative amounts of information lost for each feature in 
transmission under different listening conditions. For any 
feature, the amount of information transmitted was (not 
Surprisingly) inversely monotonically related to the 
severity of the white noise masking or the degree of 
filtering. What is of interest however, is the relative 
imperviance of some features compared with others, and how 
the relative imperviance of the features to transmission 
loss is ahttenentecias affected by the three basic 
conditions of white noise masking, high, and low-pass 


filtering. 


Under ali but the most unfavourable condition of white 
noise masking (-18dB, where the subjects' responses are 
virtually random) the features of nasality and voicing were 
better preserved than duration, affrication, and place of 
articulation. White noise primarily masks the auditory cues 
carried by the components with lower intensity, which for 
most speech sounds are in the higher frequency regions of 
the acoustic energy spectrum. Its effect on intelligibility 
is similar to that of low-pass filtering (as the Miller and 


Nicely data show). 


Miller and Nicely's analytical techniques were 
inadequate for the problem of deriving an optimal set of 
features that would account for the pattern of 
misidentifications among the off-diagonal elements of the 


confusion matrix. Wilson (1963) applied Torgerson's (1958) 
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MDS procedure to the -12dB S/N condition of the Miller and 
Nicely data. This particular level of white noise masking 
was chosen for being: 

ee. low enough to permit a considerable 

amount of differentiation between the 

consonants yet high enough to permit 

application of the analytical 

techniquess. sof Wilson, -1963, p. 89). 
He derived two sets of interpoint distances from the 
confusion matrix and (following Torgerson) reduced these two 
16 dimensional configurations in both cases to four 
dimensions by the Principle Axes method of factor analysis. 
Wilson used two quite different formulae for deriving 
interpoint distances from the frequencies in the confusion 
matrix because of the unsolved problem of choosing an 
appropriate function to relate raw data scores to distances. 


This problem is particularly accute with the older "metric" 


forms of MDS. 


The two (orthogonal, unrotated) factor solutions agreed 
with one another and the Miller and Nicely findings to an 
interesting degree. For both MDS solutions (one based on 
Shepard's (1957; 1958) distance formula, the other on his 
own derivation) the first factor clearly differentiated all 
the voiced from the voiceless consonants. The nasals were 
Clearly distinguished from the other consonants by the 
second factor loadings for both solutions. However, both 


solutions do not clearly indicate a simple nasality factor: 


Ty 


r iy 


> age 22:18 Fo hh dial 20) etea ‘ave 
= ee 3 fry {ice Lebar: pdtwod ek) ‘Hin ES ope | —— 
| <3 sx tend, Mee siahtndny {tadds Abadeanyall a 
etey(sae gasses ido peat saz i enehyaaat it Yo sho 7 — 
egntvrrs5 nol Solinrer _ tte igh, shiop. oe as 
; Lent se 4 hes 
ah hG Ac. we sla tal ciel cil ent 


s; gateoods Sa enlies 


> 


‘- 7 - 
aw ae) | > AA ie ig 


fomttge. nabauiee ‘26058 (esetomy, yimnebndsae) ont sar nit pee 


al dpe the2? ware ans asbhiw a> Did aaiitans ano ae ri 
eer tens wHOK aged eden <a fred fee 2 erpsi eabvoonedad a 
nik. ad tedto,- ang Seen enasteih qzet gtetry aibrageds 
aks se renrogsoiitas alas soa yiex ty, ald’ (anton xen awa 


nna ie 


Mb ge 
Srey iehan i pet -eeponcndec {Mee tes tt “fs went tne — wat 


is 
wis dd “atgiaieaes ah rey as gout: hadezapatto ts . tiseta 
oe ae apdebsdll 2oc 50> vaooe ie 
epee +a. on shabswtoa ee 
- hie his a : i “ 
~~ i \ k 
acy » aes a ade 


47 


For Shepard's measure, Factor II gives 

sizable positive loadings to the longer 

duration consonants, /S/, /S/, /Z/¢ /Z/ 

and to /d/ and /g/ and sizable negative 

loadings to the two consonants /m/ and 

/a7e Obviously, this is a complex and 

not easily interpreted dimension. For 

Wilson's measure, Factor Ii gives 

Sizable positive loadings to the two 

nasal consonants and near zero loadings 

to the others and so is obviously a 

nResaiity factor (1963, p. 93}. 
There was no clearly discernable agreement between the two 
solutions for factors III and IV. Nor did these dimensions 
appear to be readily interpretable. (But Wilson did not 
attempt to rotate the factor axes in order to improve the 
interpretability or maximize the agreement between his two 


solutions. ) 


Shepard (1972) reanalysed the Miller and Nicely data 
using a "non-metric" MDS technique (Kruskal, 1964a,b) and 
Johnson's (1967) wah noe of hierachical clustering. By 
pooling the six confusion matrices elicited under white 
noise masking, he obtained an optimal two-dimensional 
solution, accounting for 99% of the variance (see Figure 3.1 
below). Shepard's MDS solution shows essential agreement 
with Wilson's findings and further indicates the prominence 
of the features of voicing and nasality under white noise 
masking. However, while Shepard's results indicate that a 
considerable dimensional reduction of the confusion matrix 
is achievable, the distribution of sounds along the 
pacerence axes does not unambiguously favour aé_ simple, 


orthogonal, "two feature" interpretation. (Consider the 
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peed a ee iy A Reanalysis of Miller & Nicely (1955) data. 
From Shepard (1972). 


second dimension. It is intuitively obvious that /b/ and /f/ 
are less "nasal" than /m/ or /nN/ but it is by no means clear 
that this same quality even better differentiates /f/ and 


/bD/ * "LEO /3/ and {f2/ as a straightforward interpretation of 


the configuration would imply.) 


Superimposed on the configuration (3.1) are the 3 and 5 
group levels that the hierachical clustering analysis 
suggested were the most reliable subgroupings of the 
consonants. Beyond the two features of voicing and nasality, 
the hierachical clustering solution is not predictable from 


Miller and Nicely's a priori feature set. 
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Detailed analysis of the effect of masking level showed 
that while varying the level of white noise masking 
predictably affected the overall level of confusion, the 
recurrence of the same clusterings at each S/N ratio 
indicated that: 

The internal pattern of confusions was 
essentially invariant. Indeed with 
respect to the spatial representation, 
the effect of adding a given amount of 
white noise seemed to be almost entirely 
confined to a reduction of all 


interpoint distances by the same 
constant factor [Shepard, 1972, p.. 109]. 


With respect to the other listening conditions: 


Generally the pattern resulting fron 

low-pass filtering is remarkably iike 

the pattern resulting from the addition 

of broadband noise... the only notable 

difference seems to be that /f/ and /©/ 

group with the unvoiced stops /ptk/ in 

the "flat" conditions but with the other 

unvoiced fricatives /ss/ in the low pass 

conditions. (1972, p. .10%). 
However, the pattern that emerged through cluster analysis 
of the confusions under the high-pass conditions differed 
"radically" from that obtained under white noise masking and 


low-pass fiitering. 


It may be useful to conceptualize the effect of 
different kinds of filtering and masking as differentially 
affecting the weightings of a set of underlying features - 


enhancing the relative prominence of some features under one 
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condition and supressing or erasing other features normally 
used in phonemic recognition. Carrol and Chang's (1970) 
INDSCAL method of MDS makes just this kind of assumption. 
INDSCAL, a three mode scaling procedure, derives a "group" 
configuration on the assumption that the same dimensions are 
operative in each of the "individual" matrices that comprise 
the third mode, though the relative weightings of the 
dimensions in determining the interpoint distances may vary 
across individuals. Wish (1972) applied INDSCAL analysis to 
all 17 of the Miller and Nicely matrices. In addition to the 
dimensions of Nasality and Voicing (Dimensions I and II 
respectively), several additional factors were extracted 
namely: “voiceless stop vse voiceless fricative" 
(duration?); "second formant transition"; "sibilance"; and 


"sibilant discrimination", 


All reported analyses of Miller and Nicely's data 
concur in attributing perceptual prominence to the Voicing 
and Nasality features. Two questions however arise that bear 
on the generalizability of these findings. (a) To what 
extent does the preponderance of high frequency attenuation 
in Miller and Nicely's experiment over-enhance the effect of 
low frequency signal components and thus yield a distorted 
picture of perception under normal listening conditions? (b) 
What impact does Miller and Nicely's particular choice of 


consonants have on the results of the experiment? 
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Specifically, the lack of resonant sounds in the data set 
raises the question of whether the prominence of "nasality" 
is not at least in part attributable to a more general 
auditory feature separating not only the nasals, but also 
the glides and liquids from sounds characterised bya 


turbulent noise source. 


Wang and Bilger (1973) have recently reported a study 
of perceptual confusions under white noise masking and (flat 
frequency) signal attenuation which permits a partial answer 
to (a) and (b) above. They examined consonantal confusions 
with four sets of stimuli under (i) different S/N ratios of 
white noise masking and (ii) different signal levels under 


quiet listening conditions (see Table 3.2 for details). 


TABLE 3.2 
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DETAILS OF THE WANG & BILGER (1973) EXPERIMENT 


em cs wr es ee ee ee ee ee ee ee ee ee ee ee re ee ee 


of: (a) white noise masking; 6 S/N ratios 
ranging from -10dB to +15dB 

(b) signal attenuation without noise 

masking ("quiet™ condition) 


Subjects: 16 paid volunteers assigned to 4 listening groups, 
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Wang and Bilger also employed an a priori set of 
features to analyse subjects perceptual confusions. However 
their analytical method was more powerful than that of 
Miller and Nicely. In addition to simply comparing the 
information transmission levels for different features in 
order to distinguish those that are perceptually prominant 
(stable) from those that are weak (subject to transmission 
ioss), Wang and Bilger were concerned with minimizing the 
internal redundancy of the feature set as a whole. To this 
end they developed: 

eee A Sequential method of analysing 
transmitted information which 
Systematically identifies from among a 
number of features, those on which 


performance is high, and which takes the 
internal redundancy of the features into 


account in doing SO. This is 
accomplished by partialling out, in each 
iteration, the effects of features 


identified in earlier iterations. The 
analysis also allows us to determine 
what proportion of the total transmitted 
information is accounted for by the 
- features identified as perceptually 
important. The procedure and the 
rationale behind it may be loosely 
interpreted as the information analogue 
of a stepwise multiple regression 
analysis {Wang & Bilger, 1973, p. 1249]. 


They were thus able to input a large and highly redundant 
set of features into the analysis without risk of loosing 
those features that make a Significant independent 
contribution to the identification of the consonants, as 


distinct from those whose reliability of transmission may be 


accounted for by their covariation with one or another more 
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reliably transmitted feature. The feature set used by Wang 
and Bilger was simply a combination of those of Chomsky and 
Halle (1968), Singh and Black (1968), Wickelgren (1966), and 


Miller and Nicely (1955). 


table> 3.37 Lists, in /rank order, the features that 
emerged as significant for the different stimulus sets under 
conditions of white noise masking (averaged over all S/N 
ratios). Cole one of Table 3.3 gives the results of 
applying Wang and Bilger's algorithm to Miller and Nicely's 


pooled conditions of white noise masking. The prominence of 


TABLE 3.3 


ee es we ee i ee a 


PERCEPTUAL SALIENCE OF FEATURES - WHITE-NOISE CONDITION 
(WANG & BILGER, 1973) 
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All features that made a significant (independent) 
contribution in the Sequential Information Analysis (see 
above) are listed in rank order. 
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data where these features are applicable. (No nasals are 
included in the CV-1 or VC-1 sets.) Unfortunately, the 
question raised earlier with respect to a more general 
resonance feature remains unanswered because not one of the 
19 input features serves to group the resonants together. It 
is notable however, that the only stimulus set containing a 
mixture of nasals and other resonants (set CV-2) yields 
nasal, vocalic, and round as its three strongest features 


(i-e., the subsets /m,n/; /1,r/3; /h,w/). 


Other features that emerged prominantly under white 
noise masking were: Open (for a definition of this and any 
other feature mentioned elswhere in this paper see the 
aiphabetic listing in Appendix A), Sibilance, and 


Continuance. 


Returning to the question of the perceptual over- 
enhancement of low frequency Signal components under white 
noise masking, Wang and Bilger observed that: 

Voicing and nasality are well perceived 
in the presence of masking, but their 
intelligibility drdps relative efto | «that 
of other features in quiet [Wang & 
Bilger atl 3 ptpert25 44. 


Singh (1971) observed the same effect for the perception of 


minimal pair phonemic differences under noisy vs. quiet 


conditions. 
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Singh and Black (1966) obtained perceptual confusion 
Matrices on 22 intervocalic consonants as spoken and heard 
by native speakers of four different language groups (Hindi, 
English, Arabic, Japanese). An a priori set of seven 
features (Voicing, Nasality, Aspiration, Frication, Place, 
Duration) was used in an information transmission analysis 
identical to that of Miller and Nicely. 

The Single striking outcome of the 

present study lies in the rank orders of 

the seven channels in the relative 

amounts of information per channel - 

that is in the "importance" of the 

channels. A single rank order in this 

regard obtains for all the listening 

groups: (1) mnasality (2) place (3) 

liguid (4) voicing (5) duration (6) 

frication and (7) aspiration {Singh and 

Black, ) 19667 psese7qs 
However, the application of Wang and Bilger's expanded 
feature set and analytical algorithm (which takes account of 
the internal redundancies of the features) to the Singh and 
Black data suggest a rather different ranking of features in 
terms of perceptual prominence (see Column 1 of Table 3.4). 


Most notably, controlling for internal redundancy, lowers 


the salience of the Place feature and raises that of 


Frication. 


Graham and House (1971) studied errors in consonantal 
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TABLE 3.4 


SLL we rr rc ce we ee ee ee ee ae ee oe ee oe ee ee 


PERCEPTUAL SALIENCE OF FEATURES - "QUIET" CONDITION 
(WANG & BILGER, 1973) 


Wang 6 Bilger Singh & Graham & 
CVv-1 ¥C-1 CV =2 Ve=2 Black House 
High Sibilant Round Nasal Nasal Sibilant 
Voicing Duration Vocalic High Vocalic Duration 
Back High Nasal Back Frication Frication 
Sibilant Voicing Sibilant Duration Back Voicing 
Frication Continuant Duration Voicing Voicing Anterior 
Duration Place(W) Anterior Frication Place(SB) Round 
Place (W) Coronal Place(W) Open Conson. 
Voicing Place(W) Nasal 
Open Open 
Place (W) 


sr i ee es a ss ee ee ee 


discrimination made by young children (aged 4 years) who 
were asked to give “same or different" judgements to pairs 
of orally presented CV syllables. The 16 consonants used by 
Graham and House are given in Figure 3.2, which shows the 
two-dimensional configuration obtained by Singh, Woods, and 
Tishman's (1971) MDS reanalysis of the original confusion 
matrix. Dimension I of Figure 3.2 appears to be a temporal 
factor separating the stops from the continuant sounds. 
Dimension II, on the other hand, clearly differentiates the 


Sibilants from the resonant consonants, 


The MDS configuration for the children's perceptual 
errors under non-noisy listening conditions is quite 
different from that obtained (by the same scaling technique) 
from aduits' phonemic bie dentaticaticns elicited under 
white noise masking (Figure 3.1). The difference between 


these two scaling solutions may be tentatively accounted for 
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EL.) Sa2Z Reanalysis of Graham and House (1971) data 
Phonemic Confusions of Young Children 
from Singh and Woods, 1971. 


by (1) a rise in the perceptual salience of the durational 
characteristics of the speech signal under non-noisy 
listening conditions, together with (2) a reassertion of the 
turbulent noise characteristics which would be most adversly 
affected by white-noise masking, and (3) a relative 
diminution of the prominence of the low frequency signal 


components in the absence of high frequency masking noise. 


In short, discrepencies between the two configurations 
may be attributable to the differences in listening 
conditions rather than the obvious task and subject 
differences. This is a rather bold interpretation, but it is 
supported by internal comparisons within the Wang and Bilger 


data. Compare the salience of the Duration feature for the 
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four Wang and Bilger stimulus sets in Table 3.3 (white-noise 


condition) with Table 3.4 (quiet condition). 


Duration appears as a significant feature in all four 
of the stimulus sets in the "quiet" condition where errors 
were induced by lowering the signal level. However it never 
appears as a Significant perceptual feature for these same 
sets of stimuli under the noisy listening condition in which 
errors wena induced by lowering the S/N ratio. The same 
effect may be demonstrated by following the changes in rank 
status of the duration feature across the six increasing S/N 


ratios (see Table 3.5). In the case of the Voicing feature, 


TABLE 3.5 
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PROMINENCE OF DURATION AS A FUNCTION OF 
S/N RATIO (WANG & BILGER, 1973) 
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Syllable -10dB -5dB OdB 5dB 10dB 15dB 
set 

CV-1 ns ns ns ns “f 5 
vVc-1 ns ns ns 5 2 1 
CV-2 ns ns ns ns ns 5 
vc-2 ns ns ns ns ns 5} 


ee we es ee er ere ee ee 


eee a es a es Se eee = 


n.b. Numerals represent rank status of Duration feature 
among others that emerged as significant in the Sequential 
Information Analysis. The characters ns indicate when the 
feature failed to make a significant contribution under a 
given level of white-noise masking. 
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the opposite trend is apparent: 
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TABLE 3.6 


SOL a a rt a rw em en re eee ee 


PROMINENCE OF VOICING AS A FUNCTION OF 
S/N RATIO (WANG & BILGER, 1973) 


Sr a em a cer rr ce ee ce em wc a re er ec ml we a ee es a a ee ee ee 


Syllable  -10aB =54B 0aB 54B 10dB 154B 
set 

CVv-1 1 1 1 1 1 Z 

vc-1 1 1 1 1 1 2 

CVv-2 1 1 4 5 6 8 

vc-2 2 2 2 2 2 4 


ee a ee a ee ee ee ee ee = 


listening conditions alone may account for differences in 
derived perceptual configurations and feature weightings. We 
would expect, for example, that duration should have emerged 
as a significant feature in Wang and Bilger's reanalysis of 
Singh and Black's cross-language confusion data (Table 3.4 
above). 


Perceptual Proximities via Similarity Scaling 


a ————— — —as ae a 


Peters (1963) appears to have done the first 
prpeeiventat study of perceptual features in phonemic 
recognition utilizing MDS. of subjective similarity 
judgements. Peters employed two sets of stimuli - one 
comprising 28 consonants in a /Ca/ frame and the other 
identical with the Miller & Nicely set. Procedures differed 
slightly for obtaining similarity ratings on the two sets of 
stimuli. On the set of 16 consonants, subjects made pairwise 
ratings of the syllables on a 9 point scale of overall 
perceptual similarity, shortly after having spoken the 


syllables aioud. The subjects' Taw score Tatings were 
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entered in 4.16" x ~46 ‘Similarity matrix” and treated as 
absolute distances for input into Torgerson's (1958) scaling 
procedure. Only three subjects judged the 28 stimulus set. 
Nine were used for the 16 syllable set and separate scaling 
solutions were obtained for each subject. The obtained 
solutions ranged in dimensionality from 2 to 5. 

Examination of the data indicated that 

two or three dimensions were relevant 

and the higher dimensions, when they 

appeared, did not seem to be necessary 

for adequate description of the data 

Peeters, 1963, p., 1987]. 
Peters, “in keeping with previous work," interpreted his 
results in articualatory terms: 

The results indicate that manner, 

voicing, and place of articulation, are 

of importance in this respective 

orders. 5 ['1963, p. ,1988-]. 

The major grouping of the consonants was 

by manner, with either place or voicing 

represented as a within group dimension 

£1963, p. 1987}. 
The reliability of the individual configurations is rather 
questionable but the accuracy of Peters' observations is 
supported by Shepard's (1972) reanalysis of the pooled 
proximity matrix derived from the nine subjects who judged 
the 16 consonants. Hierachical clustering analysis yielded 


stable clusters at the 4 group and the 8 group level (see 


Figure 3.3). 
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MANNER 4 CLUSTER 

GROUPING stops Sibilants soft hnasals LEVEL 
fricatives 

PLACE 7 8 CLUSTER 

GROUPING pb td kg Z SZ fv 0¢ mn LEVEL 


Fig. 3,3 Hierachical Clustering Analysis (Shepard, 1972) 
of Peter's (1963) pooled similarity matrix 


to the clusterings yielded by the Miller and Nicely data and 
other confusion matrices (Singh and Black, 1966; Wang and 
Bilger, 1973) is the weakness of the voicing dimension. 
Unfortunately, Shepard did not subject Peters’ pooled 
Similarity matrix to an independent MDS analysis, but simply 
embedded the above clusters in the two-dimensional space of 
the Miller and Nicely data. (And Peters does not supply the 
raw data matrices for individual subjects.) It is difficult, 
therefore, to form a clear idea of the basis that the 
subjects may have used for their judgements which lead to 
the formation of clusters along the lines of traditional 
Manner of articulation categories. Inspection of the 
individual scaling solutions in two dimensions, which Peters 
does present, shows that in the majority of cases, one axis 
polarizes the stops and the fricatives and the other 
orthogonal axis tends to separate the nasals from the other 
consonants. The configuration underlying the clustering 
would seem to be not unlike that of the Singh et al. (1973) 
reanalysis of the Graham and House (1971) data (Figure 3.2). 
Once een | dies tempting to attribute the apparent 


weakness of Voicing to the absence of high frequency 
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masking. However, Shepard (1972) suggests that other factors 
may be operative: 

One hypothesis that might explain this 

apparent suppression of the usually 

salient feature of voicing... is that 

Peters' subjects treated this task as an 

analogy task rather than a pure 

similarity task 

{shepard, 1972, pp. _107-}. 
Analogical reasoning, in which the subject systematically 
disregards otherwise prominant perceptual qualities of the 
stimuli, could have taken place. But this hypothesis raises 
the problem of explaining why subjects would consistently 
choose to disregard this particular perceptual feature. If 
analogical reasoning, essentially independent of perceptual 
processes, were affecting the subjects' performance, it 
seems more reasonalble to expect that this would simply 
introduce further heterogeneity or "noise" into the data. A 
third hypothesis (perhaps linked with the second) is that 
subjects relied significantly upon articulatory cues in 
making their similarity judgements; cues which are not 
employed in an identification task. Indeed, Peters' 
procedure favoured the use of articulatory cues, and one 
would expect Voicing to be a relatively “unmarked" feature 


in terms of tactile or proprioceptive feedback. Further data 


is obviously needed to resolve these questions, 


Black (1968) collected estimates of subjective 
similarity between pairwise combinations of 24 consonants 


embedded in a /CV/ frame. Five vowels were used and 24 
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speakers each recorded a subset of the stimuli. On any given 
trial the pair of stimuli differed only with respect to the 
consonant and not the vowel or the speaker. - (NO motivation 
was given for this peculiarity in the experimental design.) 
A group similarity matrix was obtained and subjected to 
factor analysis (with varimax rotation). Black extracted no 
less than 12 factors. Factor I emerged as bipolar and he 
interpreted ‘it as "“sonority or a smooth-rough dichotomy." 
Factor II showed heavy positive loadings on the glides but 
also had a significant negative loading on the affricate 
/C/fe Factor IIfI was monopolar, showing significant positive 
loadings exclusively on the "soft" fricatives /f£,%,0,V/. 
Factor IV appeared to be a sibilant dimension (hard 
frication). Factors V to VII separated out the pairs of 
stops /k,g/, /t,d/, and /p,b/ respectively, from the other 
consonants. The loadings were too few and too weak to 


justify interpretation of any of the other factors. 


Because of questionable technical procedures that Black 
employed, his data were reanalysed for this study (with a 
principal components analysis and varimax rotation). The 
pattern of eigenvalues obtained from the principal 
components analysis indicated that no more than seven 
dimensions were justified by the data. Factor I clearly 
opposed the "“hard" fricatives to the resonants SiglslenWe 
and m/. Factor II also appeared to be a hiss - -resonant 
dimension, aot this time opposing the "soft" fricatives 


/ « efeVf to the resonants /1,y/. Factor III was bipolar and 
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difficult to interpret as a single dimension with strong 
positive loadings on /t,df and negative loadings on 
/w,hw,t/. Factor IV separated the sibilants /s,z, and, 
though less strongly, S/ from the other consonants. Factor V 
clearly differentiated the two nasals. Factors VI and VII 
separated out the pairs of stops /P,b/ and 1KeD/ 
respectively (see Appendix B for further details of the 


analysis). 


To gain an overview of the perceptual relationships 
between the consonantal targets, it is useful to graph the 
first two principle component factors, which together 
account for 56.08% of the common factor variance and 47.69% 
of the total variance (see Figure 3.4). The graphical 
representation of the perceptual relationships between the 
consonants (3.5 above), and the pattern of factor loadings, 
both in the original study and the reanalysis, show a high 
degree of general agreement between Black's (1968) data and 
that of Peter's (1963), though such agreement could easily 
be obscured by superficial differences in analytical 


technique. 


The plot of the first two principle components shows 
the major grouping by "manner of articulation" that Peters 
observed and Shepard (1972) later corroborated with 
clustering analysis. The major "resonant" group is 
consistent with Peter's "nasal" cluster - /m/ and /n/ being 


the only representatives of this group included in the 16 


gm érathedd 4 £s Ape tinge per 


ly 6) B 


2i fyb So@esuss, tam Pet 


4 


¢ tn bag 


a vie 
» 7 f , 
a gn ina 


er Pegs Levon sogak edz to. 


| 


igaayp or fetedb,2f zt sa70 GER 


Atayo? ite 5 idsmee 


bes Moe 2m (8 
at? nowwied <; (aeant sala 
epachect wages I6- ages 


teid « gods yeie¥bonaat 


i Seabees tetense to sesys 
Po i. ahs i : he ; ret 
(iles~ blab tieoston Joon Wbpot _ eis0re% te sn? 


Layer ylnee,.3> nose! Baier onmada me: 1 ms 2 


“- : j i 


bile -Agete (erety a*goek4 


we | : does U 


segdc AY esnomon nigtan® = owe gush diese balk ‘Jere eA? : a 
‘ete? SOR ‘nei agiliotets oleae yi gptqvere sohae ety old 
Atte fetatcse: re seal aie : eee Soe lies, a aa 
a’ anese: “pga togeok. | ees Sega gotserewts A ef 
ee Sty, Date Vay = sii aie | | 
ar eds = eee qtope 2 ; Re evn ecomtennen ind ate: 


“< ne e) } 
a ul é a ie 
=~. ra me s f = . ae : y ¥ i 7 , 
1 : Le vi S a a oe Lt ig el 


65 


FIgoZsaes Reanalysis of Black's (1968) Data 
Similarity Rating of Consonantal phonemes 
Principal Components analysis 


Miller and Nicely consonants. The "stops" would be separated 
from the "soft fricatives", but for the fact that the third 
component is not shown in the graph (3.4), thus yielding a 
basic four-cluster configuration of : stops, sibilants, soft 
fricatives, and resonants. Varimax factors V to VII in 
Black's analysis and factors III, VI, and VIII in the 
reanalysis further corroborate the weakness of the voicing 


feature in the scaling of similarity judgements’ under non- 
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noisy listening conditions. 


- Pruzansky (1971) used a computer-controliled sorting 
apparatus to initiate the presentation of 16 /Ca/ syllables 
(the Miller and Nicely Betws sanieces located the sounds 
with respect to one another by arranging 16 pegs (one for 
each syllable) on a 16 x 16 pegboard. The Euclidean 
distances between pegs in the subject's final configuration 
on the board served as input to the Carroll - Chang (1970) 
INDSCAL MDS program. The author reports: 

The resulting group stimulus space 
revealed a separation between 
continuants and stops. Nasals were 
clustered. Pairs of stops differing only 
in voicing were grouped together 

{ Pruzansky, 1970, p. 85]. 

Jeter and Singh (1972) studied perceptual similarity 
judgements of eight English consonants fBretedsf Vr Se¢Z/ 
separately presented under either the auditory or the visual 
mode ("phonemic" vs. "graphemic" similarity). The ABX trial 
method was used to obtain auditory similarity ratings. The 
two group (n=60) similarity natrices {the phonemic and the 
graphemic) were scaled by Kruskai's method of MDS. The 
"stress" rating - Kruskal's index of goodness of fit between 
the input proximities and theldetived distances - for the 
auditory mode matrix which is of primary interest here, was 
high: too high in fact to generate confidence in the 


reliability of the final solution. Dimension I of the 


derived three-dimensional configuration: 


| ah ine Peat 


' he 


oa! ra08 Uiagpos-a8 sake * ‘Buchs weer 


are 
“F ¥ Pyar (aa Ysrvet < 
: wy At wetpaedaas he’ ak aie Sipe. i 
coh thu + arene or ha RS ‘ ar (os) 


} 
‘ 
+ 


P - at be ver (ie an Pe, dite J 
1 af gopen lade so: Sake a atin 
22 * tices) Sees, eT esse. 2 


Mat jam ‘ 
a"? UA ad es en Pa 4 il ve be dt pal’ 
Avs@s> Yr Pere ONS a 
2949 ees pee Hae » ate 
Yio 114) 29374 PG ee +) nei ea 


ited? er 3 | 
> | €2 | Lutte rd . 


Pe! 


= | rohnert 


“| ‘bad ienahe qe | 
iy é o ee 


a0 Re ae 
wT. vepnever flac linly Gres Etna aussitp oy fons Be%,, 


Vt ete Pas oe Dey 2 - spanner sa. is “4 
leunty ofr oh QWogmole eae 1 


tnase Ane, vd? ‘oye bantieaae 


(OY pera n 


OL 


4 
od tion ~iaoaone allt) =eaprsed abit ieee, esa) balers “b 
im s an Se nado nis “4a, tater aoe (at 


- 4 


Tea wsed: xh a weeut ep 39 evi a SBet yin, = eu tder 
a 3 5 een byyi those bax aipdeleory rah hy it 


EY Ia) Sharet at pe pac ty) wg dot ds jee show ‘riorshes! ie 


+ 2. 


sy =: nis mB ta sil pal oF : 738% ar Ann gar stata 
; : | Lee 

* « rye. ra sd iat 

=. ae io - om xsi al ad - hesbetga 4 on : waz °o 2 iLi4 enti | f 
; ; ame? HERS) Cede hats Sb =n tds bom kamtit ie 
: F , a ; 7 < : / i, . Ne : y 
oe Fe, ee : te 
’ 7 = at - —_ < = my is as 
la = : n ~~ : ) y vay Af 


67 


ee» Could be clearly identified asa 

manner feature: all stops were separate 

from all continuants [Jeter 6& Singh, 

AQT2 5 IPO T0S-]5 
Dimensions II and III resisted clear interpretation. 
Multiple regression analysis showed that a three binary 
feature system of: place (labial - non-labial), manner (stop 
- continuant), and voicing best predicted - in that order - 
the derived interpoint distances for the auditory mode set 


(R=0.677). The resonant dimension is notably absent from the 


stimulus set. 


Singh, Woods, and Becker (1972) reported a much larger 
study of perceptual similarity scaling, using 22 consonants 
Zed tele CR) Red nea sO PO SeZeheseWel pls: yin 8 and three 
similarity scaling methods (equal-appearing interval scaling 
(SF), magnitude estimation (ME), and triadic judgements 
(ABX)). A group matrix was accumulated for each of the 
scaling methods. The three group matrices were analysed 
separately by Kruskal's (1964) method and compositely by 
Carroll & Chang's INDSCAL method. Five dimensions were 
extracted from the INDSCAL analysis (see Table 3.6). 
Substantial differences between the derived perceptual 
configurations under the three scaling methods were noted. 
These differences are partly characterised by the relative 
weightings of the INDSCAL features for each of the group 
matrices (Table 3.6). There seems to be some question about 


the uniqueness and stability of this solution: 
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Sr i cr wn a ee ee ee a ae ee ee ee 


INDSCAL DIMENSIONS AND WEIGHTINGS 
FOR THREE SCALING METHODS 
SINGH, WOODS, AND BECKER, 1972. 


Se ee a ww ew we ee ee ee ee ee ee ee 


DATA COLLECTION WEIGHTS 


SS 8 SE 


SF ME ABX 
SIBILANT - NON-SIBILANT 0.349 0.380 0.562 
FRONT - BACK 0.351 0.401 0.2911 
PLOSIVE - NON-PLOSIVE 0.277 0.260 0.310 
VOICELESS ~- VOICED 0.306 Geli? QO.217 
NASAL - NON-NASAL 0.315 0.248 0.174 


The scaling was repeated in five 
dimensions with several different 
starting configurations. The clearest 
interpretation was found for the five 
dimensional space whose interpoint 
distances correlated at 0.78 with the 
data [Sangh ct ail., 1972, p._1709]. 

Differences between the three scaling methods are best 
observed when the three group matrices are scaled 
separately. Singh et al.'s criterion for determining the 
optimal dimensionality for the MDS analysis of the three 
group matrices is questionable. For ail three matrices, 
Kruskal's stress criterion, and the plot of the tau 
correlation between derived distances and input 
similarities, would seem to indicate a two-dimensional 
solution (see Singh et al., 1972, p. 1704) and not the three 
and four dimensional solutions that the authors’ chose. 
Consequently, Singh et .al.'s data were reanalysed with a 


Kruskal MDS routine (Euclidean distance metric). The 


resulting two-dimensional configurations for the three sets 
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of data are given in Figure 3.6 below. 
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Conclusions 


The development of methods for establishing perceptual 
features involved in phonemic recognition, a posteriori, on 
the basis of a set of experimentally generated proximities, 
has freed the researcher from a certain amount of conceptual 
tyranny exercised by traditional phonetic taxonomies. 
However, it has by no means supplanted commonly used 


phonetic categories. 


None of the traditional a priori features can clain 
ubiquitous perceptual prominence across all methods that 
have been used to estimate perceptual proximities, though 
some, clearly, are generally more important than others. 
Nasality is a particularly strong feature, though much of 
its prominence may be attributable to a more general 
Resonance factor with which it is confounded in the set of 
16 Miller and Nicely phonemes employed in many of the 
published studies. Voicing emerges as a strong perceptual 
feature under white-noise masking. There is some 
disagreement about its status under quiet listening 
conditions. The Sibilants show a strong tendency to cluster 
together under non-noisy listening conditions. There is 
virtually no support for a feature such as Stridency which 
groups both the "strong" and the "weak" fricatives. Place of 
Articulation, which has always been problematical from the 


viewpoint of phonetic description is also unclear 


perceptually. The -Labial-Nonlabial and the High-Nonhigh 
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(palatovelar vs. non-palatovelar) contrasts appear to have 
Significant status, at least in proximity measures based 
upon similarity judgements. There is no apparent 
justification for grouping labials and ears (Jakobson's 
grave feature) on the basis of similarity ratings or 
confusion matrices - the familiar acoustic justification 


notwithstanding. 


The Stop-Continuant or Duration feature appears to be 
strong under quiet, but weak under noisy, conditions. It 
appears to be one of the factors primarily responsible for 
the observation that under good listening conditions, the 
consonants tend to cluster in accordance with traditional 


Manner of Articulation categories. 


Many, but by no means all, the apparent discrepencies 
in the reported studies of perceptual relationships among 
the consonantal phonemes, can be readily attributed to 
peripheral signal masking effects, or to the composition of 
the experimental stimulus set. In particular, more 
information is needed about the impact of the choice of 
scaling technigue upon the stability of the derived 


perceptual configuration. 


The independence from extra-perceptual processes of 
proximity ratings derived from similarity judgements is 
questionable and by no means demonstrated. On the other 
hand, the dependence of proximity ratings derived from 


confusion matrices upon the particular kind of ‘stress used 


e008 Of teeTDS SiesieagOS 


hesed pats en | lt XS ae 


eho ed Ai rey = ae arod® f ; ; 
Sear ee - 

rier iat! aAler hAg= thn. . aoa Pe ee 

ie what-#s tleelpaae my A OL A | 


tol Smyvitira, aad ie oe) ses alal a 4 


{2e0gs. ste ee) Q6roeaugiz0 casenspasneanee 4 


»an iy . Ye aay} | > og WEoR sua 
: : f to { 
ta i 7 
, . Ee eo id - Pa = ies ; 
se2es vittaqg* yo wobsat nae 
ee 08ST Tallis Sond f Dog thaity 


ijag oath 4 aya as — vse ced 4 


i egoPeksoones ads ar aa Leonia ditine thie: osha 
: a 1 


o.0m de lnsetieg. 4 um sank int + EE Realaea ee edt a. 


ho mw iio, ods . te vce sat diode ‘tater ‘ek oi 
byv iz Ee id > ait Ethaads aif?! Aoife Svat wees, piubiacn ‘ 


a as irey tie Leu tious” 
a yx 4 ris Seo at 


"9 unce@h Epo y2esser-a site hos ane - mistit wee... het: 
; SP att f 

. wh o*tese bap C1 Ethaeety pert bases aroiven vrbebiory | 
Yedtc wd? au sti "iT aE _*aaed on Wal. ee hd sodtanip, | 


. Rderties ©) “eee crea ab ; oopahaddek “ate - ‘sene's 2 


i eee We ate Jud aa} ogy zaobaphe rtegieos, 4 
oi , 


bs a a @ 


- ice , .| 9 
a f : a ; a R« no aiy 
- —-_ : Val 


72 


to generate analysable error patterns is well established. 


Current research has not developed to the point where 
perceptual distances are predictabie as performance 
characteristics of some model that samples as input certain 
critical acoustic parameters of the signal. This is arguably 


the goal towards which future research ought to be directed. 
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CHAPTER IV 
ELEMENTS OF MULTIDIMENSIONAL SCALING 


In its raw form the proximity matrix generally provides 
few clues about the the underlying causative factors 
responsible for the very apparent variation in the values of 
the off-diagonal elements. As Shepard (1963), one of the 
pioneering Contributors to methodolegy in this field, has 
pointed out: 

Man's information processing system...is 
notoriously unable to discern any 
pattern in an array of numbers by 
inspection alone. Therefore ... we must 
first supplement this natural processing 
system, with artificial machinery more 
specifically designed for the task at 
hand; namely the extraction of implicit 


structure underlying the explicit but 
bewilderingly enormous array of numbers 


fp.33 ] 
It has only been with the comparatively recent development 
of modern mathematical methods of data reduction such as 
hierachicai clustering (Johnson, 1967), factor analysis 
(Harman, 1967), and multidimensional scaling (Torgerson, 
1958; Shepard, 1962; Kruskal, 1964a; 1964b) that it has 
become possible to gain access to the presumed latent 


structure in the raw proximity matrix. 


The family of procedures known as multidimensional 
scaling (MDS) - and the same may be said of factor analysis 
- attempts to achieve a conceptually useful Lrepresentation 


of the data matrix by interpreting the perceptual proximity 
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scores as distances (or more precisly, as some function of 
distances) in a multidimensional space. The space is 
generally, though not necessarily, taken to be Euclidean. 
The actual dimensionality of the space is assumed to be 
unknown and, together with the distances, it is one of the 
parameters to be determined by the computational algorithm. 
The dimensionality of the best fitting representation for 
the. set of objects (phonemic targets in this case, treated 
as points in a perceptual space) is important for the 
interpretation of the derived perceptual configuration. The 
minimum number of orthogonal dimensions necessary to 
adequately represent the optimal configuration of interpoint 
distances is indicative of the minimum number of independent 
variables required +o account for the variation of the 


scores in the proximity matrix. 


MDS as a representational scheme favours (but does not 
dictate) a perceptual model in which the phonemic targets 
are recognized on the basis of a small number of independent 
(or near independent) scalar features, which may or may not 
have readily discernible correlates in the physical signal. 
If the proximity matrix is mapped into a configuration of 
points in a Euclidean space, the interpoint distances are 
invariant under rigid rotation, uniform stretching, and 
placement of the origin of the reference axes. Given such a 
mapping, the commonly employed research strategy is to seek 
a unique rotation of the reference axes which is (a) readily 


interpretable and (b) supported by independent evidence. 
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Axis rotation and factor interpretation are central 
preoccupations in MDS and factor analytic research, but at 
this point more needs to be said about the model and 


computational algorithms employed in MDS. 


The computational algorithms used in the more powerful, 
so-called "non metric" MDS procedures (Kruskal, 1964; 
Torgerson and Young, 1967) strive to attain a configuration 
of n points that optimizes a monotonic best fit of the 
derived interpoint distances fea the pairwise proximity 
ratings in a space of nininun dimensionality. 
Ste eeterieticai ly; the computation begins by setting up an 
arbitrary configuration of points of some specified 
dimensicnality (zr, where r<n). The pairwise proximities read 
from the input proximity matrix are then rank-ordered fron 
smallest to largest. This ranking serves as the criterion to 
which the algorithm strives to monotonically match the 
initially arbitrary set of interpoint distances. On each 
iteration of the:algorithm the points of the configuration 
are shifted around by a small amount in the general 
direction of a more adequate solution. A measure of the 
degree of meamtenee hese fit between the derived distances 
and the rank ordered proximities is computed after each 
successive adjustment of the distances. Various measures of 
best fit have been proposed but they all tend to behave ina 
very Similar fashion (Young, 1970). Kruskal's measure of 
"stress" is the most commonly used estimate of goodness or 


badness of fit and also the best researched. It is quite 
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Similar to the least squares measure of fit used in 


regression analysis: 


resco 
Sudag 


where d= derived distance between points j and k 


tress = 


AS 
dix = monotonic best fitting distance 
with respect to the proximity Pjk. 


The iterative convergence on the best fitting configuration 
ceases when no further significant improvements in stress 


can be achieved. 


Stress values vary between 0 (perfect fit) and 1 (no 
fit at all), or may be expressed as percentages. Kruskal 
(1964) provides a set of descriptive labels which he 
suggests the user employ as a rough guideline for evaluating 
the adequacy of the obtained monotonic matching. Systematic 
research with artificial data (Klhar, 1969; Young, 1970; 
Sherman, 1972) suggests that Kruskal's criterion is too 
conservative, and that the evaluation of stress is a 
complicated matter requiring that account be taken of the 
number of objects scaled, the dimensionality of the final 
solution, and the anticipated error of estimation of the 


proximity scores. 


The criterion of monotonic matching between the 
proximities and the derived distances is an appealing one, 


There are usually no grounds for anticipating, beyond rank- 
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ordering, what the precise form of the relationship between 
the Nherinies scores and the hypothetical distances will or 
should be. The actual form of the relationship is obtained 
as a by-product of the computation from the final, monotonic 
best fitting curve of the derived distances plotted against 


the proximities. 


The earlier, so called "metric" variety of MDS (see 
Torgerson, 1958) required the stronger assumption of 
linearity between the proximities and the final obtained 
distances. Where it is apparent from a non-metric solution 
that the assumption of linearity is not too badly violated, 
or where the proximities can easily be transformed to render 
the relationship linear, it is advisable to employ the 
metric MDS technique. Its computational algorithm is quite 
different from the non-metric routines. If essentially the 
same configuration is recovered with the metric technique 
the investigator can be more confident in the uniqueness and 


stability of his final solution. 


a a as a a as SS SS SS SS SS SS 


MDS algorithms usually yield solutions in a range of 
dimensions specified by the user. Mathematically, the 
solution of lowest dimensionality that adequately represents 
the proximity matrix is preferable because it is the most 
strongly determined. An argument for minimum dimensionality 


can also be made in terms of descriptive economy which boils 
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down to: Why ponulate the Operation of a greater number of 
variables than you need? However the fact that stress values 
usually decrease as the dimensionality of the 
representational space increases makes the choice of the 
optimal dimensionality in many cases rather difficult. 
Customary practice is to plot the function of stress against 
dimensionality and look for a poane of diminishing returns 
beyond which increasing the dimensionality does not 
Significantiy improve stress ratings, in that each increase 
accounts for the same improvement, and may be considered to 
be fitting only the noise in the data. But perhaps the major 
criterion employed for deciding on the dimensionality rests 
(in practice at least) on the interpretabliity of the 
rotated reference axes. The investigator will be reluctant 
to extract from his data more factors than those for which 


he can find a plausible interpretation. 


A fundamental question for the researcher is how 
confident can he be that the the apparent perceptual 
structure reflected in an obtained MDS solution reflects 
real, latent structure inherent in the proximity matrix? Put 
in the form of the null hypothesis: How can he be sure that 
the variation in the off-diagonal values of the proximity 
matrix is not randomly generated? After all, any matrix of 
substantially non-zero entries will yield some kind of MDS 


solution. The simplest and safest answer to this question 
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lies in the replication of findings over independent sets of 
data. Another guide is the stress of the final solution. 
Klhar (1969), and Stenson and Knoll (1969) obtained 
distributions of stress values for 2 se eedays random 
matrices representing various numbers of perceptual objects 
scaled over a range of dimensions. They found that where the 
dimensionality of a solution is small with respect to n, the 
number of objects scaled, and n>10, the standard deviation 
of the corresponding distribution of stress values is very 
small. The investigator can use these results to construct 
rough confidence limits for testing the null hypothesis that 


no latent structure exists in his proximity matrix. 


In practice it is unlikely that no structure exists in 
the data matrix, but rather, such structure as does exist 
will be overlaid with a certain amount of measurement error. 
The amount of error may be difficult to estimate. However, 
it is useful for the user of MDS to know under what 
conditions the solutions yielded by an MDS routine are 
robust under the assumption. of measurement error. The 
standard procedure for testing a MDS algorithm is to 
eonstructisa sparticular .configuration of +n) points in © 
dimensions, subject the interpoint distances to some 
arbitrary monotonic transformation, treat these transformed 
distances as proximities, and see how well the original 
structure can be recovered by the algorithm. Young (1969) 
investigated the effect upon configuration recovery of 


adding different levels of random error to the interpoint 
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distances, prior to the monotonic transformation to 
"proximity" scores. Even at the highest level of error 
studied (where the variance of the random error component = 
35% of variance of interpoint distances) recovery of the 
original configuration was good (correlation between 
original aa derived interpoint distances >.8 ), providing 
the ratio between the number of objects scaled and the 
mene ena lity of the configuration was fairly high (53). 
Where the dimensionality of the representational space is 
low compared to the number of objects being scaled, the 
solution is strongly "“overdetermined" (Shepard, 1962). This 
is the reason why only the weak assumption of monotonic 
relationship is necessary between the proximity scores and 
the derived distances and why the algorithm seems to be 
quite robust against the effects of error variation in the 


proximity scores. 


Standard "metric" and "non-metric" MDS and factor 
analysis routines employ a Euclidean spatial metric and 
yield solutions which are unique up to the point of axis 
rotation. This may be regarded as an asset or a liability 
depending on one's theoretical predisposition. The 
rotational indeterminancy of standard Euclidean MDS and 
factor analysis provides the investigator with greater 
freedom in arriving at a theoretically satisfying solution. 


However, when the obtained dimensionality of the solution is 
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comparatively high, selecting "the best" rotation for the 
reference axes can be very diffuclt. Two or more 
substantially different interpretations of the scaling 
configuration may be suggested (each of what constitute 
mathematically equivalent representations of the data 


matrix). 


In the area of factor analysis, general criteria (such 
as Thurstone's (1947) “simple structure") have been proposed 
and corresponding computational routines developed that will 
take an initially “arbitrary" set of co-ordinate axes and 
transform them into a new set of reference axes that 
optimally meet the rotational criterion. The "simple 
structure* criterion has proved popular because, by 
simplifying the pattern of factor loadings on the input 
variables, it has tended to yield solutions that are readily 
interpretable in terms of those variables. Rotation in 
accordance with the principal components criterion, which 
locates the reference axes in accordance with eigenvectors 
that encompass successively diminishing amounts of the 
variance in the configuration, generally leads to less 
readily interpretable factors than the simple structure 
criterion. The historical debate between the "American" and 
the "British" schools of factor analysis over the factorial 


structure of human ability involves just this point. 


In recent years, the development of three-mode methods 


of MDS have provided the possibility of obtaining 
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rotationally unique Euclidean scaling solutions (Carroll §& 
Chang's INDSCAL, 1970; Harshman's PARAFAC, 1970). Whether or 
not these procedures yield “explanatory" as distinct from 
merely “descriptive" factors (see Harshman, 1970) is a moot 
point... The model is able to attain rotational uniqueness by 
making certain (quite strong) assumptions about the nature 
of the variability across the third mode, which is usually 


taken to be individuals. 


With the INDSCAL method, variation in the third mode is 
restricted to the application of a weighting factor (Wi, ) 
applied to a given individual i on a given dimension t. 
Conceptually this weighting factor may be interpreted as the 
relative salience a particular underlying perceptual 
dimension has for a given individual. The model for the 
interpoint distance between two stimuli j and k fOr 
individual i is: 


e . 
c 1 2: a 
Ay = (Wie (Xe = Key 12 


A principal disadvantage of the method 
is that it is limited to the case in 
which individual subject spaces are 
related by linear transformations of a 
common space. Even the linear 
transformations allowed are not general, 
but are restricted to those given by 
diagonal transformation matrices. The 
method may require too many dimensions 
in cases where the perceptual spaces 
represent nonlinear distortions of a 
common space (or where more general 
linear transformations are required) [ 
Carroll & Chang, 1970, p.316]. 
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In practice, this assumption may not prove to. be too 
restrictive. Carroll and Chang (1973) provide, among other 
illustrations, one with a set of data on similarity ratings 
of synthetic auditory stimuli (Bricker, | Pruzansky, & 
Mc Dermott, 1968) which is relevant to the subject of this 
paper. The INDSCAL analysis of subjects similarity ratings 
yielded a perceptual configuration in essential agreement 
with that which would have been predicted from the physical 
parametric variations used in constructing the set of 
Stimuli. This alone is no more than might be expected from a 
good standard MDS routine. What was remarkable however, was 
that the placement of the reference axes corresponded 
(without rotation) in a one-to-one fashion with the a priori 
physical dimensions manipulated in the synthesis of the 


stimuli. 


Carroll and Chang argue that the proof of the pudding 


is in the eating: 


In cases where set of a priori physical 
or theoretical dimensions were known, 
the recovered (unrotated) dimensions 
have always (to date) corresponded to 
them in essentially one to one fashion. 
We therefore argue that a is 
appropriate to analyse data in terms of 
this very strong and specific model, and 
that only if this model fails to fit the 
data adequately should one have recourse 
to a more general model [p.285]. 


On the other hand, in an exploratory investigation it is 
undesirable to have the interpretability of the solution too 


dependent on strong assumptions of the scaling model. In 
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this context the rotationally non-unique, standard MDS 
procedures have much to recommend them, at least in the 


early stages of factor identification. 


Kruskal (1964) introduced the option of scaling in non- 
Euclidean paces. His MDS routine is applicable to the 
general class of minkoe ska Spatial metrics of which the 
Euclidean and the "city block" metrics are special cases. 
The formula for calculating interpoint distances for the 
general Minkowski spatial metric is: 

de =[ ) (x -x Syn 
where n = some positive real number. 
In the case of the "city block" metric where the exponent 
(n) is 1.0, the distance between two points is simply the 
sum of the absolute differences of the point projections on 
the reference axes. In other words, equal weight is given to 
the differences on each dimension, regardless of their 
relative magnitude, in the determination of the interpoint 
Gistances. It is intuitively evident that, as hn increases, 
progressively more weight is given to the larger differences 
on the dndiecaitsn axes. In the limiting case where n?7O , 
only the largest difference on any dimension contributes to 


the interpoint distance djk. 


In this family of spatial metrics, only the Euclidean 


preserves invariance of the interpoint distances over axis 
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rotation. The question naturally arises as to what kind of 
Spatial metric is appropriate for the representation of 
perceptual distances? This is very difficult to answer. One 
approach to this problem has utilized arrays of simple 
physical stimuli where the dimensions and the levels of 
variation within a dimension are fairly clearly established. 
Interpoint distances are derived, usually on the basis of a 
Euclidean or a city block metric from a set of perceptual 
proximity measures of one kind or another. The test of the 
adequacy of a given spatial metric is its ability to yield 
perceptual distances that closely conform with the physical 


parameters of the stimuli. 


Several studies along these lines (Attneave, 1959; 
Torgerson, 1952, 1965; Shepard, 1964; Hyman & Well, 1967, 
1968; Garner and Felfoldy, 1970) have resulted in a 
categorical distinction between “analysable" and “ron 
analysable" stimuli. In the former case, the underlying 
perceptual dimensions are obvious and distinct to the 
subject, such as when the stimulus material consists of 
simple geometric patterns varying in, for example, size and 
angle of inclination. In the latter case, the underlying 
perceptual dimensions are not distinct and obvious to the 
subject but qualitatively “integrated", More elementary 
perceptual objects, such as colour sensation, or simple 
auditory or tactile displays, belong to this class of 
stimuli. Only the “unanalysable" stimulus displays have been 


found, by the above methodology, to scale adequately in 
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accordance with the Euclidean metric. “Analysable" stimuli 
have been found to conform better to the city block space, 
Or, because of some gross violation of the triangular 
inequality, to not map satisfactorily iy any well 


understood geometric representation (Shepard, 1964). 


On the basis of these studies it may be concluded that 
scaling in accordance with a Euclidean metric is appropriate 
for elementary perceptual targets such as phonemes embedded 
in a constant syllabic frame, where the underlying 
perceptual dimensions are non-obvious. However, a cautionary 
note is warranted, because it seems precisely in those cases 
where the underlying perceptual dimensions are 
N“unanalysable" that the method of vindicating a spatial 
metric by showing that it yields a perceptual configuration 
in close agreement with the supposed relevant physical 


parameters of the stimuli, is most questionable. 


Another approach to this problem that has been tried is 
that of Terbeek and Harshman (1971) who were led to question 
the validity of the Euclidean metric for vowel perception. 
They consistently found an extra and interpretively 
intransigent-dimension in their scaling solutions, loadings 
on which turned out to be highly predictable by a simple 
(non-linear) function of two other dimensions in the scaling 
solution. These results were consistent with a hypothesis of 
spatial curvature which could create conditions leading to 


the extraction of an extra, spurious dimension when the 
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perceptual distances were inappropriately represented in a 


Euclidean spatial metric. 


No strong theoretical considerations support the choice 
of a Euclidean over some alternative spatial metric, but for 
pragmatic reasons the Euclidean metric has much to recommend 
it. The properties of a Euclidean model are well understood. 
Only the nore recent "non-metric" varieties of MDS allow for 
other than Euclidean solutions and therefore cross-technique 
comparisons to test the stability of a derived configuration 
can only be made within the Euclidean framework. Also, 
parameter testing experiments with synthetic data (Sherman, 
1972) have suggested that the choice of the spatial metric 
significantly affects the retrievabliity of a configuration 
only when the dimensionality of the solution is correctly 


identified. 
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CHAPTER V 


EXPERIMENTAL RESULTS 


In this chapter a sequence of four inter-related 
experiments is described, in which the general aim was to 
isolate and dé seudne the most salient perceptual dimensions 
involved in the recognition of a selected set of English 
consonantal phonemes in open syllable position. In each 
case, proximity matrices were obtained to characterise the 
perceptual relationships between all items in the stimulus 
set. These were derived from either direct or indirect 
similarity judgements. A variety of multivariate scaling 
procedures - "non-metric" MDS (Kruskal, 1964); "metric" MDS 
(Torgerson, 1958); Principle Components Factor Analysis 
(Harman, 1967); and hierachical clustering (Veldman, 1967) - 
were employed, chiefly to test the robustness of derived 
solutions under conceptually related but computationally 
diverse analytical techniques. In all cases, group rather 
than individual proximity matrices were analysed. This was 
necessitated by practical considerations of subject 
War labindiey whieh in turn influenced the choice of data 
collection procedures. However, it seemed to be a reasonably 
safe assumption that on a basic perceptual task of the kind 
involved in these experiments, intersubject variabliity 
should not be of significant theoretical interest. But this 


remains an untested assumption that ought to be considered 
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in future investigations. 


Experiment I was an exploratory investigation employing 
MDS of direct similarity judgements of English consonants 
embedded in a /Ca/ syllabic frame. Two factors, thought to 
be basic auditory features that could be used to 
differentiate the stimuli, were tentatively identified. 
Experiment II was a larger study concerned with replicating 
the fiedaaes of Experiment I and determining the stability 
of the obtained perceptual configuration when a phonetically 
quite different vowel is employed in the constant (carrier) 
/CV/ frame. Results indicated that while the basic 
configuration is maintained, some significant (systematic) 
Be tirba tion took place as a function of the phonetic 
quality of the vowel. Experiment III constituted an attempt 
to determine whether, on the basis of the factors discerned 
in Experiments I and II, it would be possible to predict the 
locations in perceptual space, of consonants not included in 
the original scaling set. In Experiment IV a proximity 
Matrix for the stimuli used in Experiments I and II was 
generated from indirect estimates of perceptual similarity 
based on semantic scaling. The rationale for this experiment 
was Simply that if the same perceptual dimensions 
successfully describe proximity matrices generated by two 
quite different, but theoretically well motivated data 
bases, then the case for the psychological reality of such 


perceptual dimensions is considerably strengthened. 


resgas irolep 1 yoltoagien ea ane 4 
<i tei sx To ‘thm BE PRortre eat 3 
Cwroal aye suet aadal dy se sk; 
hives See ( Weeieees viet, : 
ro vi! Viesvess’. ter ‘whi quiet Bel 7: 
priesge0, Hime apes! 6 ORY. . a 
pi lames sue rd ‘panm tteqKe Se 
indy setseomehiaes Enemy umentsips <neateato.6 
4 ott de GO Eeae ee Dewar Baa: 
iis 203 seen eee cdan sgt WWD 
ents thay) toeoia ka jets so Jtmeneatalaw” att 


v ne pial 
+ mip. 2 GL ORT a za poet wur 
“7 e ope 


hernt «tenes. CE seaktoqel Re rn 
«5 3% ait De Sey sie i | 


cere : 


ra 743cea0g* Si fi tigne oh spe ane= % sain 

to* alert Sen etaedsadon ko sang Lronqeisy ah eaottar ; 
jesaaty sy ‘cna izeger nme ame entiare Lanbeiye. Said 
y SE) Gee Os ‘dams isgee Ae fia Lit we ‘edt, tol sie 


" 
i 
teenieech Peatqeaang. wire Mae ance! ‘oor bese tanga = 


*te3 /7 5444.0 4 3- at a, =A ’ a 0 DELS Litem PRs sees ao boand an 

o 7 7 ' i chee ¥, 
peeseases § tebths>*n4™ RAMs saa! ‘se ‘tpt tigade- sé ou 
OUF 44 > S8agare: 25h chibalan: manson Tun aeeoape , ” 


a7), AD By ee a ipa v4 ep hpoua edit Bs ina) / Uéhhsa 225 stivp ey, : 


a0UG, 10 CAehert ols i entaniatipatie cat eee als rend saad | 


a ea A sea ae oie tr code pala si Bao: v veo bh Lan suaoreg 
cele ‘ ) : S " ; ) j ae 
_ : eae i : a i ; = ie rau 


90 


The subjects for these experiments were drawn fron 
several sources: 

(i) Students enrolled in general introductory 
linguistics courses at the University of Alberta; 
(Experiment I, n=27; Experiment IV, n=30). 

(ii) First and second year junior college students 
from the Red Deer and Medicine Hat Colleges in Alberta 
(Experiment iI, n=906). 

(iii) Students enrolled in an evening course in 
English Literature at the University of Alberta (Experiment 
III, n=22). The great majority of the subjects may be 
described as naive with respect to formal phonetic training. 
All subjects included in the analysis were native speakers 
of English, where "native speaker" is defined as having 
"used English as your major language since you were five 


years of age." 


Experiment I 


In me investigation nz English consonants 
SPebdsteGsCeSeSeZeNeM,l,h/ embedded in a /Ca/ syllabic frame 
were scaled for perceptual similarity employing a modified 
method of triadic comparisons. The obtained group proximity 
matrix was subjected to Kruskal's method of MDS and also to 


a hierachical clustering analysis. 
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Because the total number of trials in an experiment 
employing triadic comparisons increases very: quickly as a 
function of -the number of objects scaled, this study was 
constrained to operate with a subset of the consonantal 
inventory. An attempt was made, admittedly on intuitive 
grounds, tc wake the chosen set representative of the range 
of perceived auditory variability encountered in the 


complete set of consonantal phonemic targets. 


SS SSS Se 


The modified method of triadic comparisons used in this 
experiment is an adaptation of the method of "complete 
triadic comparisons" outlined in Torgerson (1958). Consider 
the set of all possible three-way (triadic) combinations of 
nh perceptual objects to be scaled. Subjects are presented 
with one triad at a time, drawn randomly from the (n! /(n- 
3),$34) triadic combinations. On each trial they are 
instructed to assign a rating of 2 to the pair of stimuli 
out of the 3 which are most alike, and a rating of 0 to the 
two which are least alike. The remaining unchosen pair is 
assumed to take a proximity rating somewhere between the 
"most alike" and the "least alike" pairs and is 
automatically scored 1. Hence over all triadic combinations, 
pairs most frequently chosen as “most alike" will emerge 


with a high score and those mainly chosen "least alike" with 
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a low one on a scale of overali similarity. This method, 
which requires that the objects be presented three ata 
time, works better for visual than auditory stimuli since 
the latter cannot be scanned, but must be distributed in 
time. To mitigate possible problems of trace decay in short- 
term auditory memory, each triad was decomposed into three 
Simple pairwise judgements. To illustrate the strategy, 


consider the possible triad: 


ta 


pa da 


from which the following pairs may be formed: 


A B 
py Sa Ea pa - da 
fa - pa Ca - da 
da - pa da - €a 


Each element of the triad acts once in the three pairwise 
presentations as the standard for a simple perceptual 
judgement: Which pair of syllables (A or 8B) sound most 


alike? 


The total set of pairwise presentations generated from 
alii triadie¢ éaibi tations were recorded by the experimenter 
in a randomized order, with appropriate pause intervals 
placed between the stimuli. A single speaker (the 
experimenter) was used for recording “all the stimuli. 
Because of the large number of trials in the overail 


experiment (1,980) each subject judged only a portion of the 
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total set of comparisons. Twenty-seven subjects were used, 
divided into nine groups assigned to systematically 
overlapping blocks of trials (220 per block). Each testing 
session lasted approximately 30 minutes. subjects checked 
their responses (either the first or second pair for any 


trial) on an optically scorable IBM answer sheet. 


Procedure 


A group testing situation was used. After an informal 
introduction to the purpose of the experiment and a general 
description of the experimental task, subjects were orally 


presented with the following instructions: 


You will hear pairs of syllables 
presented over the loudspeaker. Pairs 
such as: (pause) pa - Ca (pause) pa - ba 
(pause). Your task is to ask yourself 
which of the two pairs sounds’ more 
alike: the first or the second pair? In 
this case most people would probably say 
that the pair pa - ba sounds more alike 
than the pair pa - ca. Pa - ba was the 
second of the two pairs, so in this case 
you would mark the second alternative on 
your answer sheet. On the other hand, if 
the syllables in the first pair sound 
more alike than the syllables in the 
second pair, you would put a mark in 
column one of your answer sheet. On some 
items you will have difficulty deciding 
which of the two pairs sounds most 
alike. However, we would like you to 
choose one of them even if it seems 
sometimes like "guesswork." Don't spend 
too long making up your mind. It's your 
first impression that we are interested 
in. In deciding which of the two pairs 
is most alike just go on the sound of 
the syllables, not on how they might 
have been produced... (further 
instructions about recording responses 
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on answer sheets)... Any questions? 


4 


Scoring the Responses 


A group proximity matrix was accumulated in the 
following manner: An initially empty 12 x 12 scoring array 
was set up in which the rows represented the 12 syllables 
(or phonemes) as standards and the 12 columns represented 
possible responses. On a given trial , one of the two 
possible response syllables (the columns of the scoring 
matrix) may be associated with a particular syllable (or 
phoneme) as standard (the row elements). A "1" was added to 
the appropriate row and column intersection of the scoring 
matrix each time a particular standard and response were 
associated by being chosen as the "more similar" pair. When 
accumulated over all the experimental trials, the scoring 
matrix indicates the relative frequency with which 
particular syliable pairs are chosen as "more similar" than 
all other pairs. The accumulated scoring matrix is 
approximately (but not precisely) symmetrical. To meet the 
necessary assumption of symmetry and possibly improve the 
stability of the similarity scores, the corresponding off- 
diagonal elements of the cumulative scoring matrix were 


summed to produce a symmetrical proximity matrix (Table 


5.1). 
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TABLE 5.1 


a a ee ee ee Se ee 


PROXIMITY MATRIX EXPERIMENT I 


Ss a a a ee ee ee ee ee ee ee = 


b 101 

t 77 66 

a 76 90 107 

é 32. 8ehy a7 67 


4G 40 54 44 F2 


Ps 45 28 51 38 104 82 

h 63 75 56 57 64 53 63 

z 35. 48) 43. 53 1.64 101182. 45 

m BS Tay 29 68 see O2 sot 7 2039 

n 60 63. 50 76 33 41 39 78 46 105 

it 61 68 47 63 29 49 45 78 50 90 92 
Results 


The next step was to determine the optimal spatial 
configuration for the proximity matrix. For reasons given 
earlier, the Euclidean metric was chosen as a basis for 
computing the interpoint distances in spaces ranging from 


one to nine dimensions. 


Figure 5.1 shows the adequacy of monotonic matching 
(stress) as a function of the dimensionality of the 
solution. Several computational runs from different starting 
dimensions were made to help ensure that the preferred 


dimensionality would emerge clearly. According to the 
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30. \ 
\ 
” ay 
a : dotted lines and bars 
tu r indicake mean andvariance 
= f of stress values obtained 
~ . with random data (Kihar, 19649), 
20 ‘ 


Bar = 2 standard deviations. 


: Solid line = stress values 
v from independent computations 
i starting at 6 and % dimensions 


Ve 620 Bu Veo Sen One Ars 
DIMENSIONALITY > 
EIQ, 5.1 Stress X Dimensionality 
Experiment I 


criterion of minimal stress with minimal dimensionality, the 
graph indicates that a three-dimensional solution (with a 
"good" stress of 5% by Kruskal's conservative rating) is the 
preferred solution, although the two dimensional 
configuration (stress=10%) should be considered in 
attempting to arrive at an adequately interpretable 
solution. Figure 5.1 also areas confidence limits, based on 
Klhar (1969), for the null hypothesis of no latent structure 
ina Prearmrcy matrix tor 12." Objec ro sca bed #1 oT to "75 
dimensions. Clearly the null hypothesis may be rejected. 


However, with only 12 objects scaled, one would be 


disinclined to accept a dimensionality higher than four. 


The three-dimensional configuration was examined first 
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Fras S32 Three Dimensional Configuration 
Experiment I Kruskal Scaling 


arbitrary, but two of the three orthogonal axes in the 
three-dimensional solution seened to be readily 


interpretable. 


Dimension I locates the resonants /la,ma,na/ on one 
pole and the affricate Vio Wap the yoremens stop /ta/ and the 
fricatives /Za,Sa, Saf on the other. It opposes sounds with 
harmonic and formant structure and a “musical" quality to 
those with spectral energy distributions characteristic of a 


turbulent noise ~ source. For want of better teminology this 
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was tentatively labelled the "resonance - hiss" dimension, 


or simply "resonance." 


Unrotated Dimension III /ta,pa,ba/ versus /za,sa/ could 
be construed as a temporal factor: the duration of the 
consonantal portion of the syllable. Consistent with this 
interpretation, the nasals, the lateral, and the affricate 
are located in the middle of this continuum. However, 
inconsistent with this interpretation, the stop /da/ instead 
of loading positively with the other stops on unrotated 
Dimension III takes an intermediary value together with 


vY 
/ca,na,la,na/. 


Unrotated Dimension II is even more difficult to 
conceptualize. What plausible auditory, acoustic, or 
articulatory scale would oppose the voiced stops /da,ba/ to 
the sibilants Yea, sa/ with medial values assigned to 


fpta, 2a, laf? 


A considerable improvement in interpretability of the 
three-dimensional solution may be obtained with a 45 degree 
Clockwise rotation of the reference axes in the DII-DIII 
plane (see Figure 5.2). The notable discrepency that clouded 
the interpretation of Dimension III as a temporal factor is 
removed. Dimension IiI' clearly opposes the stops to the 
continuants. Dimension II" may be interpreted as a voicing 
dimension. All the voiceless consonants have positive 
loadings on this (rotated) axis and all the voiced 


consonants are negatively loaded. 
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Examination of the two-dimensional solution revealed an 
interesting parallel with the three-dimensional 
‘configuration. The two dimensions of the three-dimensional 
solution - tentatively identified as "duration" and 
"resonance" - are clearly suggested by the two-dimensional 
. configuration (see Figure 5.3) s The two-dimensional 
configuration is also useful for representing the three 
major perceptual groupings that emerged when the proximity 
matrix was subjected to a hierachical clustering algorithn 
(Veldman, 1967). It is interesting that these three 
statistically derived clusters correspond to the traditional 
Manner of articulation categories which are represented in 
the scaling set of phonemes: the stops, the resonants, and 


the sibilants. 


Even within the major clusters, the target phonemes 
appear to be differentiated in the appropriate manner by the 
two inferred dimensions of sonority and duration, which 
would suggest these are scalar rather than categorical 
perceptual features. However, before these results could 
reasonably sustain the weight of such an interpretation a 
number of questions about their reliability, replicability, 


and range of applicability needs to be answered. 


Experiment If 


Experiment II constituted a replication and extension 
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P2g. D233 Two Dimensional Solution 
Experiment II Kruskal Scaling 


of the first experiment, with MDS of triadic comparisons of 
perceptual similarity. Two sets of stimuli were scaled in 
interpolated trials. One set was the same as that employed 
in Experiment I - 12 consonantal phones embedded in a /Ca/ 
syllabic frame. The second set comprised the same 12 
consonants but combined with the vowel /i/. The 1,320 trials 
resulting from a randomized combination of the two sets of 
660, 0 A eur B) pairwise comparisons were divided into 6 
ee eet ea ts brocks|) of (220 trials. There were 15 to 24 
subjects per block of trials. The experimental procedure was 
identical with that of Experiment I, except for the method 
of constructing the stimulus tapes. Instead of separately 
recording each trial, the basic set of 24 CV syllables were 


recorded once, digitalized by a program written for the PDP 
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12 computer (Roszypal, 1973) at a sampling rate of 15kHz and 
stored on LINC tape. A second program constructed the 


experimental tape from the LINK-stored stimuli. 


Results 


Group proximity matrices for both the /Ca/ and the /Ci/ 
sets were accumulated (Appendix D) and the data were scaled 
in from one to four dimensions by Kruskal's method. The plot 
of stress against dimensionality (Appendix D) did not reveal 
a eee preferred dimensionality for the /Ca/ set. For the 
/Ci/s set the scree test suggested that the two dimensional 


solution (stress = 9%) was preferable. 


The two and three-dimensional Kruskal solutions for the 
/Caf/ set were compared with the two and three-dimensional 
solutions of Experiment I to determine the degree of 
replicability of the scaling configurations over independent 
data sets and to see whether the same hypothetical 
dimensions of "resonance-hiss", "duration", and "voicing" 
could be maintained. For the two-dimensional solution, a 
comparison of Figures 5.4a and 5.3 shows a satisfactory 
replication of the results of Experiment I. When the axes of 
Figure 5.4a are rotated (graphically) to conform to the 
Orientation of Figure 5.3 it will be noted that the loadings 
of the sounds on each of the dimensions are highly similar. 
There is some discrepency with respect to the stop 


consonants on the "hiss - resonance" dimension. It can be 
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Vv. 
Cl 


st. 
(b) /Ci/ set 


Fags Dirt Two Dimensional Solution 
Experiment II Kruskal Scaling 


seen that /h/ shows a Slightly stronger tendency to cluster 
with the resonants. Rather than attempt to interpret these 
minor discrepencies, which could have their origin in the 
different stimulus sets used in the two experiments, it 


seems better to regard them as indicative of limits on the 
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accurracy of resolution obtainable with the data collection 


procedures used in these experiments. 


The two most prominent dimensions found in the first 
experiment, "resonance-hiss" and "duration", are clearly 
apparent in the two dimensional solution of Experiment II. 
They may also be discerned in the three-dimensional solution 
for the /Ca/ ‘set (Appendix D). However, the "voicing" 
dimension, which was clearly the weakest dimension in the 
results of Experiment I, failed to emerge in the three- 
dimensional solution for the /Ca/Y set in Experiment II. 
(Though there is an equivocal suggestion of a voicing 
dimension: see Appendix D.) The reason for this failure to 
yield a clearly discernable voicing dimension in Experiment 
II may lie in differences between the subject pools of the 
two experiments. Most of the subjects employed in Experiment 
I had some knowledge of linguistics and therefore probably 
some familiarity with phonetic description. As the voicing 
feature is particularly prominent in traditional phonetic 
Classifications and pedagogic illustrations of phonological 
rules, it is plausible to suggest that its clear presence in 
the data of Experiment I but apparent absence in Experiment 
II simply reflected differential linguistic training of the 


two groups. 


The two replicable dimensions of "resonance" and 
"duration" found with the /Ca/ stimulus sets are apparent 


also in the scaling configurations for the /Ci/ set of 
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Experiment Ii. They emerge most clearly in the two- 
dimensional solution (see Figure 5.4b), but are also 
discernable in the three-dimensional solution (Appendix D) 
where, again, there is a weak suggestion se a “voicing" 


dimension. 


. in both the two and the three-dimensional solutions for 
the /Ci/ set, the clarity of the hypothesised temporal 
factor is obscured by a transposition of the rank ordering 
of the sibilants on this dimension. To further clarify the 
question of whether this shift is attributable to structural 
differences in the two proximity matrices and not to a 
computational indeterminancy associated with the scaling 
sieoLitha; the data of Experiment II were subjected to 
Torgerson (1958) scaling and a Principal Components factor 
analysis program, with options of varimax and oblique axis 
rotation (Program DERS:FACTO4 in the Division of Educational 


Research program library, University of Alberta). 


The input to the Torgerson scaling program 
(DERS:SCAL05) was a comparative distance matrix based upon z 
score transformations of "proportion of choice" scores (see 
Torgerson, 1958). The program estimates absolute distances 
for spaces of varying dimensionality by an iterative 
procedure based on Messick and Abelson (1956). For major 
steps in the computation of the interpoint distances and 


details of the scaling solution see Appendix E. 


For the factor analysis, the rows of the proximity 
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matrices for the Kruskal analysis (Appendix B) were 
intercorrelated. Each of the two resulting correlation 
matrices (Appendix F) represents a set of indices of 
Similarity: namely, the similarity of any given Syllable as 
a standard (i.e., "x" ae the X - A X - B trial) with any 
other syllable as a standard in the set of 12 stimuli. 


Details of the factor analysis are given in Appendix F). 


Both the Torgerson and the Factor Analysis programs 
yield, in the first instance a "principle axis" type of 
solution, i.e., a configuration in which the first axis is 
located so that its object loadings account for the maxinun 
amount of the common variance in the data and subsequent 
axes account for progressively diminishing amounts of 
variance. No uniquely prefered dimensionality emerged from 
the four analyses. There is, however, considerable 
consistency between the results of the two scaling methods. 
Figure 5.5 shows a =plot of the “first two principle 
components of the four-dimensional solutions for the /Ca/ 
and the /Ci/ data sets as yielded by both the Torgerson and 
Factor Analysis programs. the four-dimensional solution is 
chosén as being the highest one might reasonably anticipate 
for the relatively small number of objects scaled. It is 
notable that for all four solutions the first principal 
component which emerges clearly separates the fricatives and 
the affricate from the stops and the resonants, accounting 
for approximately 50% of the total variance. By rotating the 


reference axes approximately 30 degrees from the first two 
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Factor Analysis Factor Analysis 
{Caf set ee /Ci/ set 


Torgerson Scaling Torgerson Scaling 
{Caf set /Ci/ set 
Figs 5.5 First Two Principle Factors 


Experiment II Torgerson Scaling and Factor Analysis 


principal axes one can obtain a two factor solution 
(accounting for about 75% of the common variance) in 
substantial agreement with the Kruskal solution (Figure 


5.4a). Moreover, the transposition of the sibilants on the 
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"temporal" axis fori sthe-,/Ci/ set «noted, in (the |. Kruskal 
solution is apparent also in the Torgerson scaling and the 
Factor Analysis. Possible reasons for this perturbation in 
the perceptual configuration are discussed ae the following 


chapter. 


The varimax and the cblique rotations of the factor 
analysis are of considerable interest. For the /Ca/ set, 
with both the varimax and the oblique solutions, the first 
two axes’ Suggest the "duration" and “resonance - hiss" 
factors respectively, (i.e€., Factor I opposes the stops 
Ltd ed, P/ “(in that order) tc the fricatives /SiZeS/i Factor 
II has positive loadings on the resonants /n,m,l1/ and 


Significant negative loadings on /SeSaCeZ/)° 


The first two varimax and oblimax factors for the /Ci/ 
set also yield the dimensions "duration" and "resonance" 
respectively - providing one can accept that a shift in the 
relative duration amongst the Sibilants caused the 
perturbation in the perceptual configuration which was noted 
earlier. The pattern of loadings on Factors III and IV are 
consistent across both orthogonal and oblique rotations but 
are not readily interpretable. Nor do they agree with 


Factors Iti and IV for the /Ca/ set. 


If the basis of the subjects’ similarity judgements 
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obtained in the two previous experiments is adequately 
captured by the proposed two factor model, then from a 
knowledge cf their phonetic properties, it should be 
possible to predict the location in sukubphall space of 
consonants ase included in the original scaling set. The new 
set of stimuli must encompass roughly the same range and 
type of auditory variability found in the original scaling 
set if the same perceptual dimensions are to have a chance 
of emerging from the scaling solution. Also, in order to 
determine whether factor invariance holds across any two 
independent scaling studies it is necessary for both sets to 
share a core of comnon objects. These requirements are best 
satisfied by including in the new scaling set a "common 
core" from the original set which load most highly (and 
“purely") on the hypothesised underlying factors. Thus /pa/ 
and /za/ were chosen on the basis of the Kruskal scaling 
solutions of Experiments I and II, as representing extremes 
on the "temporal" dimension. Similarly, /ca/ and /la/ were 
chosen to represent the "resonant - hiss" dimension. With 
the addition of four "new" sounds /ka,ga,wa,fa/ this 


comprised the set of stimuli for Experiment III. 


With varying degrees of certainty, the “new" sounds 
should be predictable in the perceptual space of the "old". 
The /w/ is clearly a strong resonant with a somewhat less 
abrupt onset and a slightly longer duration than /1/. The 
stops /k/ and /g/ should clearly cluster with /p/ on the 


temporal dimension. Their ranking on the resonant dimension 
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Asi not predictable because Experiments I and II are not 
particularly consistent on this point. The location of /f/ 
is somewhat more difficult to predict. Labial frication has 
a decidedly "softer" quality than alveopalatal frication. It 
therefore may be expected that /f/ would load closer to the 
resonant pole of the "resonance - hiss" dimension than 4 
or /z/ (though the voice component in the latter renders 
this partially debatable). In order to ensure that the 
labial frication would not be lost in digitalizing the 
Signal, this sound was produced with somewhat heavier 
emphasis than it might normally obtain in “list reading" the 
items. For this reason, /f/ - in this experiment at least - 


should load with the continuants on the temporal dimension. 


The eight stimuli were recorded and the experimental 
items constructed in the same manner as for Experiment II. 
The scaling procedure was identical with that of the 


previous experiments. 


Results 


Again, Kruskal's stress criterion did not clearly 
favour the two-dimensional solution, but the three- 
dimensional configuration was difficult to interpret and no 
higher a dimensionality would be justifiable with such a 
small number of objects scaled. fhe two dimensional 
configuration did however conform well with the predicted 


scaling solution (see Figure 5.6). The “anchor point" 
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stimuli did not shift relative locations substantially 
(though /p/ emerges with an uncomfortably high loading on 
the "resonant" dimension). The locations of the "new" 


consonants are, more or less, where they "ought" to be. 


Fig. 5.6 Two Dimensional Kruskal Scaling Solution 
Experiment III 
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Experiment IV was undertaken with the original set of 
12 [Ca/ syllables in order to asses the impact of 
experimental procedures used in constructing the proximity 
matrix and also to provide additional information for ch 
aracterising the nature of the factors underlying the 
derived perceptual configuration. Obviously, if perceptual 
distances or proximities vary substantially and 
unpredictably over different but otherwise well motivated 
techniques of measuring perceptuai similarity, then the 


utility of the whole method is brought into question. 


In Experiment IV, an indirect measure of similarity was 
derived from subjects! ratings of the phonemes (embedded in 
the same Ca syllabic frame) according to a set of 13 verbal 
scales thought to be relevant for describing subjective 
qualities of sounds in general and speech sounds in 
particular. The scales chosen, mainly formuiated on a "most 
-least" continuum were: 

most hissy - least hissy 

most vowel like - least vowel like 
most bright - least bright 

loag: = short 

most clear - least clear 


most harsh - least harsh 
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most distinct - least distinct 

high pitch - low pitch 

most abrupt - least abrupt 

most even - least even 

most loud - least loud 

most melodious - least melodious 

most effortful - least effortful 
In selecting these scales, an attempt was made to avoid 
descriptors that posessed a specialized linguistic or 
phonetic connotation. (We were not interested in examining 
the subjects' academic knowledge of phonology or phonetics.) 
An effort was also made to sample as broadly as possible 
from the domain of discourse: in this case, “ways of 
describing sounds." Also, care was taken to represent what 
were hypothesised to be the underlying factors responsible 


for the derived perceptual configuration in Experiment I. 


Hethod 


Thirty subjects rank-ordered the 12 syllables on each 
of the 13 scales. The group testing procedure dictated that 
the syllables be presented orthographically, but subjects 
were instructed to make their ratings, as much as_ possible, 
on the sound of the syllables, not on the basis of written 
form or articulation. The average rank score of the stimuli 
on the scales formed the basic measurement for the 


subsequent analysis (see Table 5.7). 
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AVERAGE RANK SCORES ON RATING SCALES 
EXPERIMENT IV 
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An indirect measure of Similarity between any two 
sounds in the columns of Table 5.2 was obtained by matching 
their respective profiles of scores on the 13. scales. 
Pearson's product moment correlation coefficient was used as 
the index of profile similarity so that the matrix of 


Similarity scores could be factor analysed. 


Results: 


ee eS SS 


Scale reliability estimates, expressed as estimated 
test - retest correlation coefficients were obtained by a 
method based on analysis of variance (Winer, 1971). Table 
5.B shows the estimated reliability of average rank scores 


for the phonemes on the scales. 


: a U4e: 5 eee a ww Lert, 4 | 


r ath : roe ~ Pe: Ls 5 
‘ om E025 / oS yas Bee, v10-we?, Bia ot hae , 
: eee! Pee est a 3 g.0 Pa et Me — 
‘ oi? su ¥ Pi Wa M ies Var jiten ro ee 
a Va Pik Oe eee bis 20 Pe oe! oe 
Pee ee OD Tahal iT. Pe 4.5 
” ‘ . ea 4s ee i 1! Pas Dat Ps) fi.2 ‘Da 
vthiogu & TV Sp aaa meu way es ane . 2. doves 
C.E: - |. 0 0 0 gi Rae tea oe ET ee! 
‘0 sv r a > ; as a -t + 2, | “Gs te Tied vg bie o.2 
e " 345 c= Sia ss 4d 8 a a. Er4 pe F «ed 
ee ee <% esavene a0 yt ikem cB a, 
it \tce caer. 7 Fe Ooh OB 1 Rok ?.@ ta 
- ak 
oa 2S en _ He eg ee ‘om SURARACNdct-ackc oe 


6 1; oath’ pese tees. ea. otto”: “aecgpn hes 
piiidechan L_qajr argo crv ae aida? ‘ba eee wet tk ‘ah 
| OMG “EP Uoiy ae” eae, Ae soko gat “ati beyogess # ts 
4s. dey che kya E od eeaa5 no (setiserag cileescie’ dh ion at 7 " 
‘eh aG7 843 92 co tanta: #17g6%q So mb bae ae” 


fabare tet ai Rieter eez002 teired iat 


: : ; 1 oi aaa 4 
| : a 42 Ps cg S2¢ , i ) » ‘ 
a7 oe ek ad Se ; et bac een, 
Deaisieve 26° Uoneesgx>  peprautehy | erdTSdnt Let, eae «7 
aya nnn Fe: tm atapint? te09 ROE sess a0 tevte7 och aEPT f : af 
. # Liber. lta o.2008) soqpenar’ 30 reeves | ad pane born Me 
oeToOMe ser ‘PESISEVS oO ud A List 2 Bryrsatese ois, sevde 6:2, 20 
: Agi 
rr 7. asa Aa? «© Wogstods of2 ror 
a 4 7 ie , , : is . Ls ae go as 


sé . a tty. Pah. ee f ae a4 


TABLE 5.3 


me a ee ee = = 


RANK ORDERED ESTIMATED SCALE RELIABILITIES 
EXPERIMENT IV 
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When one asks which verbal scales do the subjects 
employ most consistently in rating the 12 stimuli, it is 
obvious from a rank ordering of the reliability estimates 
that they are just those which would be expected to 
discriminate well between the phoneme targets on the basis 
of the results of Experiments I and II. For example, pitch 
and loudness are very well established auditory perceptual 
qualities, but evidently play little role in differentiating 


between these 12 consonantal phonemes. 


The correlation matrix of inter-phonemic similarity 
scores was factor analysed by the principle components 
method, which suggested that a two dimensional configuration 
was optimal for this set of data (for details see Appendix 
G). Simply from inspection of the obtained configuration 


(Figure 5.7) a strong correspendence with the results of 
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Evq, 5s) Factor Analysis of Phone Rating Data 
Experiment IV 


Experiments I and II is apparent. Rotation by the varimax 
criterion resulted in placement of the axes more or less 
where theoretical preference would have dictated. The two 
dimensional configuration accounted for 81.8% of the total 
Variance. With the varimax rotation, 42,7% ~of the’ total 
Variance was attributable to Factor I, the "duration" or 
“abruptness of onset" dimension and, 39.1% to the "sonority" 


or “resonance - hiss" dimension. 


Furthermore, the same major three-way grouping by 
manner of articulation was obtained through cluster analysis 


as was found in the previous experiments. 
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CHAPTER VI 
ANALYSIS AND DISCUSSION 


In the previous chapter, based upon experiments 
involving similarity scaling of consonantal phones embedded 
in a CV syllabic frame, two hypothetical ineneeptaat 
dimensions - a qualitative "resonance-hiss" dimension and a 
temporal factor, "duration," were identified. The perceptual 
configuration upon which this two-factor model was based was 
shown to be replicable, stable over different scaling 
methods, and substantially indifferent as to whether direct 


SSS] SS = 


or indirect similarity rating techniques are employed to 
generate the matrix of perceptual proximities. A change in 
rank ordering within the sibilant cluster on the putative 
"temporal" dimension which occured when a change was made in 
the "carrier" vowel raised some question about the 
explanatory adequacy of this factor. The two-factor model 
yielded consistent predictions about the location in 
perceptual space of certain stimuli not included in the 


original scaling set, and thus could be said to meet minimal 


requirements of factor invariance. 


However, while the proposed two-factor model may be 
theoretically attractive (see below) and consistent with the 
data to hand, it can lay no claims to uniqueness as a way of 
representing the perceptual relationships between the sounds 
included in the target set. This is evident when one 


considers the problems of dimensionality determination and 
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rotation. The writer has relied rather heavily on the 
"subjective" criterion of theoretical preference to 
establish the dimensionality and orientation of the 
reference axes which will enable him to "adequately account 
for" the derived multidimensional perceptual configuration 
obtained through MDS of the input proximity matrix. In some 
instances the subjective criterion is clearly supported by 
the “objective" mathematical indices of the "most adequate" 
solution (€. ge, Experiment IV). In other instances, 
particularly where the "objective" criterion fails, or none 
exists, the writer has of necessity exercised liberty in the 


choice of dimensionality and axis rotation. 


Generally speaking, a better "fit" between the input 
proximities can be obtained by choosing a solution of 
greater dimensionality - or, in factor analytic terms, more 
of the common SaoWnctad Variance may be accounted for by 
extracting a higher number of factors. On the other hand, 
low dimensional solutions, and those factors which emerge 
first in a factor analysis, are oe strongly determined, 
and hence more reliable than high dimensional scaling 


solutions and late emerging factors. 


In the face of this problem of non-uniqueness, two 
general strategies appear to be open to the investigator. He 
may leave temporarily unresolved the question of rotation 
(and even that of dimensionality if he works simply with the 


raw proximity matrix) by inquiring what kinds of variables 
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are most strongly associated with, or (in some experimental 
manipulation) most potently affect, the relative interpoint 
distances which determine the overall shape of the 
perceptual configuration. This strategy was also employed in 
the present study where multiple regression analysis with 
different kinds of variables (phonological, perceptual, and 
acoustic) was used to build predictive equations for the raw 
proximities and the derived (Kruskal, two-dimensional) 
interpoint distances. By examining the normalized predictor 
weightings for different multiple regression equations it 
should be possible to clarify the nature of the perceptual 


space, 


The second strategy is to attempt to find independent 
empirical evidence for the factors isolated in the scaling 
study. There are a variety of ways this may be done. It may 
be possible to show, for example, that certain reliably 
ratable perceptual qualities can predict with a high degree 
of accuracy, scores on a particular interpretive factor. In 
this way a verbal characterization of the factor can be 
generated based on subjects! abilities to describe 
perceptual attributes of objects. If the physical correlates 
of some hypothetical perceptual dimension can be isolated 
and controlled (for example through synthesis in the case of 
speech perception), then the investigator is in a much 
stronger position to evaluate the perceptual reality and 
salience of his hypothetical factors. In this (ideal) 


situation, the experimenter would not only be able to 
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predict the scaling configuration of a set of perceptual 
objects from a critical set of measurements of specific 
Signal properties, but be able to manipulate . the shape of 
the perceptual-. configuration by varying the relevant 


synthetic signal parameters. 


Physical correlates of the two perceptual dimensions 
isolated in the scaling experiments were obtained (see 
below) by acoustic analysis of the /Ca/Y stimulus set used in 
Experiment II. It was found that on the basis of these 
physical correlates, the derived perceptual configuration of 
the 12 sounds could be predicted with fair accuracy 
(Pearson's product moment correlation between the derived 
perceptual distances and interpoint distances predicted on 


the basis of the physical correlates = .81). 


Scores 


Earlier in this paper (Chapters I and II), the 
possibility was entertained that features employed in 
phonemic recognition might be uniquely linguistic, rather 
than perceptual properties that mediate auditory recognition 
in general. If this is so, it may be anticipated that an 
Nabstract" and singularly "phonological" distinctive feature 
schema such as that of Jakobson, Fant, and Halle or Chomsky 


and Halle should have considerable predictive power for the 


i . j — me Can eae - 


‘ 
® 


Z a Ma | | ' na 


‘nar tig > @48,..6 29 stngstaw ot ; 


oth hoes uc pirsuironowe Re laine meotarty, y 


Lhd Ores eee 
me) 


jo wete oot wtecea duns A @bdihws tue. Ri + 
loeveie: <.% Qelyaev, ait ognlGandyesdee = meile 
nasens sen taint 


rofetemis —eacuweseq Gwe ede bo ev iaaesta08 


al ] 


| hettasda 2 susniaegae-« itt F doe me Pe 
rh fered sae sebpakse Yer\ ear FG eingsoan okieneeh ae we 


_ i 

-ool? So- alee, ed? ab Sate ‘Taioe coe *2 ae 
x 
‘hme been Dwhryparred insy baa ait cnvsiestee Lepka 


oon atin, “ieee aea: at biwes rans Sh 
_ taf, ad? w?aedtud ots shewen AeA ee zi 
See: 
> bodochegy eovecesih, senogan Red ov loner | 
FUG 


as ° 


ity, = sabeiezaos si 


_ 


LF s 7 F " 
Loe — , - ne ae Oa 
eit: ytd AMA. 2: qrerqel age, eid? “ps wwelxed: a har 


ia ~ hogy Sie = sae Iie h hes fit ‘Sonstige axe nied 


Tpllets Rraue yengeno ad dapee siokapsnose: thy 
ait hy i 
AES Sapo Hhigio z > ie ot thie rad, jaa tavrynosey: ses 


Ah, a bedagee ds 0 ghd anal ak Geo * * tsxeaog 6 ye 


a o 
e2G%a0) s9e2g0fs ohh “Legayb loge 


| Eertinrate tas Neapzaadal ea 
aa oS ESm caw 7a 4 SARMRaaE Pa 


Zz th r 


ee al ‘ap foray beeen 


120 


interpoint distances derived from MDS of perceptual 
proximities. Moreover, the raw proximity scores themselves 
should be predictable from a knowledge of the distinctive 


feature specification of the sounds. 


On the other hand, results of the four MDS_ scaling 
experiments reported above suggest that the bulk of the 
systematic variation in the derived perceptual configuration 
is attributable to just two factors that seem to represent 
generai auditory features of no unique linguistic character, 
and for which it is unnecessary to invoke any specific 
linguistic adaptation or specialization of the perceptual 
mechanisim. This, however, could be a mistaken impression, 
made possible by the non-uniqueness of the factor solution. 
A fair test of the utility of any proposed feature schema in 
predicting perceptual distances can be obtained through 
multiple regression analysis, if it can be assumed that the 
interpoint distance between two phonemic targets is a simple 
additive function of the number of contrasting feature 


values that serve to distinguish the two targets. 


Dj« Le, 1 a Frid + Ty {Fj.- Fel +. eet In | Fyn Fev 
perceptual distance between targets j and k 
value of feature n for target j 

normalized regression weight for feature n 


where, D3» 
F; 
pan 


Phonological feature systems do not make explicit 
claims for the relative importance of their component 


features, though some rank ordering of the features is often 
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implied. The primary task of the regression analysis is to 
determine what the relative independent contribution of each 
feature in the system is for the particular perceptual 
configuration in question. The model is of course applicable 
With any a priori set of features, categorical or scalar. A 
stepwise multiple reaqression routine was used in the 
following analysis. The model can be expected to maximize 
the contribution of a few independent features to the 
prediction of the interpoint distances. The regression 
weights assigned to features in subsequent steps of the 
regression analysis will be highly influenced by the 
particular features extracted in previous steps of the 


(( 
analysis. 


Subjects! megeaneae to. the /ca/ ‘set of stimuli “in 
Experiment It were used in the following regression 
analyses. This set was chosen because it was based on a 
larger sample and the stimuli were better controlled than in 
experiment Le Five phonological feature systems were 
independently evaluated by the regression analyses with both 
the Kruskal two-dimensional interpoint distances, and the 
raw proximity scores as criterion variables. Feature System 
I was that of Jakobson, Fant, and Halle. System II was that 
of Chomsky and Halle. System III was the one used by Singh 
and Black (1968) eieren- for the present set of stimuli, is 
identical with the Miller and Nicely system except for the 
faaicde of a single feature (Liguid). The notable 


characteristic of System IV is that it incorporates three 
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TABLE 6.1 
FEATURE SYSTEMS EMPLOYED IN REGRESSION ANALYSES 
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binary manner features, one for each of the major perceptual 
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clusters first.observed in similarity rating data by Peters 
(1963) and subsequently found in Black (1968), Stevens and 
House (1971), as well as the data of the present 
experiments. System V contains two trinary features 
corresponding to the hypothetical two-factor basis for the 


derived perceptuai configuration of the present study. 


Results of the various stepwise multiple regression 
analyses are summarised in Figures 6.1a and 6.1b (for 
details of the analyses see Appendix H). The contribution of 
each feature in the final equation to the prediction of the 
criterion variable is indicated by the bar graph where the 
change in the squared multiple correlation coefficient which 
is associated with the feature in question is plotted on the 
yo taxis. this measure provides an estimate of the variance 
contribution of each feature in the final equation and may 
be expressed as a percentage. In interpreting these results, 
it should be remembered that the features are being applied 
only to a subset of the phonemic inventory and consequently 
some (such as Consonantal) do not have a chance to apply, 


while others (such as Strident) do not have their domain of 


reference adequately specified. 


It is remarkable that virtually all of the predictive 
power of the Jakobson, Fant, and Halle, and the Chomsky and 
Halle feature systems can be accounted for by a single 
feature which opposes the sibilants to all the other 


consonants. This feature corresponds to the first factor 
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proximity scores. No other feature accounts for more than 5% 
of the variance, which is roughly the cut-off level for 
deciding whether a feature makes a significant independent 


contribution to the prediction equation. 


The Singh and Black (Miller and Nicely) feature system 
has two significant contributors to the prediction of the 
interpoint distances and the proximities. The relative 
importance of Duration and Frication for the distances is 
reversed for the proximities as criterion. Analysis IV once 
again shows the overwhelming importance of the sibilance 
contrast to both criterion variables. The other two "manner" 
features - Resonance and Stop make significant independent 
contributions to the interpoint distances, but Resonance 
gives way to Voicing in the prediction of the proximity 
scores. Feature System V, as would be expected, distributes 
the predictable variance more equally between the two 
trinary features of Duration and Resonance-Hiss. In terms of 
overall predictive power, there are no grounds for choosing 


between Feature Systems IV and V. 


Generally speaking, the interpcint distances are more 
predictable than the raw proximity scores (R is, on the 
average, five percentage points higher for the interpoint 
distances.) This may be due to a certain “cleaning up" 
effect attributable to the scaling algorithm where some of 
the error present in the raw proximity scores is corrected 


for when each interpoint distance is determined as a 
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function of all the other interpoint distances in the 
configuration. However, a certain amount of information loss 
also seems to occur when the proximities are transformed 
into distances in a two dimensional space. The Voicing 
feature appears in all five prediction equations when the 
proximity scores constitute the criterion but in none of the 


corresponding equations for the interpoint distances. 


Perceptual Rating Scales as Prediction Variables 


———— — mm a a we a si Se = 


The somewhat higher predictability of the interpoint 
distances is also ‘indicated when the perceptual rating 
scales of Experiment IV are used as predictors of the 
interpoint distances and raw proximity scores of Experiment 
II. The results of this regression analysis are summarized 
in Figure 6.2 (details in Appendix I). The table of average 
rank scores (Table 5.7) collected for Experiment IV served 
as the basis for correlating rating scale scores with 
interpoint distances and proximitiy scores. Ideally, the 
actual stimuli used in Experiment II should have been used 
rather than the rankings obtained in Experiment IV under 
non-auditory stimulus presentation. However, the latter data 
were readily at hand and while the final level of prediction 
might have been somewhat higher if the stimuli of Experiment 
II had been auditorily presented while subjects made their 


ratings, it is doubtful that the pattern of regression 


Tia a : ) . beeaL® ca ah hgh tN Ch 
2 at Gores ae suuheadllll ets a a a a 9 
aed Hot faa Tyeer: fo i nese ee 
eee ee lead ey ft: «a 

siip ied 44P «Sean Faro? abeenS oF ) wt mae 
ad: fey Aeebr ayy robastaee avic ae) int 2 
zF' 36 Sib a) sh Ae 9 Ste ods yiestenne game 


hi Fe ef ¢ e4e7ne wus pie rane 


=2 es 7 


, denen a pe 
Lodeatist getsaais ss eae lusd gaa sal, 


J 
cPriy 


U 
| iat 
} 


— 


loraS*o, =a ¥ thedatgiing per seinen 
>  Lgvikestan wy Gee isouagitien ath 


; f ; =f 
‘ py ASnEe st; ep. Heep ara hee. J 
fied 5 @ 8 =47 09h) (suabaery aes ge, as is : 


oO 
aa 


oi 


ya WTA? 3464 SE-V iene aviecs Tis ney ie eager ‘one 
pete | bide iT. dias pit’ ‘easiieem), 4 i 


Wie 4? @coekzoqya ad nope tein (he — slg 


= es. 


“<6 * OTe, Shane wes ouixelb argo yor. area” Vig sé * Am 
ne rary : 
. 7 ’ : ‘vo 2.9 
. wt ) LeeLee Toe tein sade n x 24 rts f x ‘\ : j 
- | ae. 9 a, gh paty, a bopy tg 
- 5 - feutos: - 
Se, teed. Vat Fleode. IT oi ey eae at nae stant, Konee) 
| nae 1 ihe Pee 
take’ “A meloaiel ot S3himlNo: Syaetaka, Bary ‘als: poder” 
‘ f Te y j ' "aaah 1: 
Pear eS RAS V2 aeeral. ne Zr ormensed Be oe ‘Yangiys-aon ; 7 
a 1 . J mF if ard <2 Wi F =~ y4 
Abse tO, hives ina) Sox gale bie Og ea th patted ro) ar 


é ' 
a 7 i 


“un €e60 ne an Loti Los es tt Aer, IHW sn OS gp ain a tdpbe i ti 
opens ain, is sidee. slian sairape ce Jortane ae Sn.4 ae 


ap bane gir a ; Y at fo3 teen : Ce a Pro re 
A : z 


127 


weightings would have been significantly different. 


KRUSKAL INTERPOINT RAW PROXIMITIES 
DISTANCES AS CRITERION Ph CRITER\ON 


VARIANCE CONTRIBUTION > 
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13 Perceptual Scales as Predictors 
Derived Distances and Raw Proximities 


The "Hissy™" scale dominates the prediction equation as 
might be expected of a dimension that clearly and reliably 
Gistinguishes the sibilant consonants fron the rest. The 
"Abrupt" scale is second in importance for both the 
distances and the proximities. "Vowel-like" appears to 
contribute significantly to the prediction of the interpoint 
distances but not the proximity scores - aace as the binary 
Resonance feature in regression analysis IV in Figure 6.1 
above seems to be of slightly less importance for the 


proximities than the distances. 
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The important question to be answered with the help of 
these regression analyses is what impact do they have on the 
tenability of the two-factor model proposed earlier on the 
basis of the MDS studies? The implications appear to be the 
same as those of the principal components analysis (Figure 
5.7). At a cost of sacrificing 10 - 18% of the predictable 
variation in the interpoint distances (4 - 8% in the case of 
the proximities), the temporal dimension may be abandoned 
and the bipolar resonance-hiss dimension collapsed into a 
monopolar hiss-nonhiss or Sibilance factor. Acceptance of 
this option would simplify the factor structure even 
further, but it seems to throw away important information 
contained in the _ perceptual structure, However, the 
strongest grounds for reluctance to accept the single factor 
solution stems from analysis of the acoustic correlates of 
the two hypothesised factors of Duration and Resonance-Hiss 


reported in the following section. 
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The approach that was used with the phonological 
feature systems and the perceptual rating scales, of 
attempting to predict interpoint distances or raw 
proximities from sets of possibly relevant predictor 
variables, was not employed in the case of the physical 
correlates of the perceptual configuration. It was felt that 
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acoustic variables was too formidable. In certain restricted 
areas, such as the perception of steady state vowels (Pols, 
1969) where all relevant information must be restricted to 
the spectral domain, it is feasible to think in terms of 
obtaining an unbiased and exhaustive sample of the total 
“acoustic space", However, with the time-varying spectral 
functions of consonantal sounds, the problem of unbiased 
sampling seems to be too open ended. On the other hand, the 
hypothetical dimensions extracted from the MDS analyses did 
suggest specific characteristics of the signals that 
subjects appeared to be using in making their similarity 
judgements. Attention was therefore focused upon physical 
correlates of the hypothesised perceptual dimensions rather 


than the (uninterpreted) interpoint distances. 


As a starting point for the acoustic analysis, broad 
band spectrograms (b.w.=300 Hz; range = .016 - 16kHz; Kay 
Electro-Sonagraph) and high temporal resolution oscillograms 
(via computer controlled read out of the digitalized 
Signals; see Roszypali, 1973) were made of the 12 /Ca/ and 
the 12 /Ci/ post digitalized stimuli used in Experiment II 
{see Appendix J). The /Ca/ stimulus set, which had the 
clearest perceptual structure, was chosen for detailed 
measurement and analysis. The axes of the Kruskal two- 
dimensional configuration for the /Ca/ set were graphically 
rotated (preserving orthogonality) to what appeared to be 
the theoretically most satisfying orientation, and _ the 


loadings of each stimulus on the rotated dimensions were 
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recorded (see Table 6.2). 


TABLE 6.2 
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STIMULUS LOADINGS ON ROTATED KRUSKAL DIMENSIONS 


ea re ee ee Se ee ee ee = ee es ere ee 


STIMULUS LOADING 
RES-HISS DURATION 
sa -0.66 1.02 
pa 0.50 -0.95 
éa -1.19 0.01 
ma 9.81 -0.19 
ta =0059 -0.95 
la 0.97 0.08 
da =O021 -0.74 
ha 0.54 0.505 
Sa -C.94 0.66 
za -0.49 1321 
na , 0.84 0.22 
ba 0237 -0.80 
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There was no difficulty finding acoustic correlates of 
the temporal dimension. It correlated .95 with the physical 
duration of the consonantal portion of the syllable and .94 
with the duration of the whole syllable. (Segmentation was 
not difficult; see Appendix J for the measurements.) It 


would be unreasonable to expect results cicearer than these. 


The Resonance-Hiss dimension, however, posed some 
difficulties. It may be grossly, but rather inaccurately, 
characterised by a separation of high and low frequency 
spectral energy bands. All the resonants (with the notable 
exception of /h/) have a low frequency, periodic glottal 


energy source. Sounds at the other end of this dimension are 
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characterised by relatively high frequency spectral energy. 
However, the quality or type of energy present appears to be 
relevant and not merely its locus in the frequency domain. 
The syliables /la/ and fea/ take extreme values on the 
Resonant-Hiss dimension yet both have substantial energy 
concentrations in the 3-5 kHz band. The crucial difference 
would seem to be that in one case the spectral energy 
distribution is highly organized in terms of harmonic and 
formant structure, but in the other it is random, or lacking 


in any spectral organization. 


The basic problem in obtaining some satisfactory 
acoustic correlate of the Resonance-Hiss dimension resided 
in the fact that Currently available acoustic analysing 
devices are ill ‘suited to detecting such a distinction, 
though the human ear and other biological sound analysing 
devices (Suga, 1972) apparently are not. Instrumental 
limitations therefore led to a choice, as the best practical 
approximation to an adequate physical characterisation of 
this dimension, of a simple bandfilter function that 


optimally predicted the Resonance-Hiss factor loadings. 


The Resonance (or, more accurately, the Vocalic) and 
the Hiss components of the stimuli were extracted separately 
by simultaneous low-pass (LP) and band-pass (BP) filtering 
of the post-digitalized stimuli. Optimal filter settings (LP 
< 200 Hz, -48dB/octave, Rockland Programmable Filter series 


1520; Bp = 2.7-5.6 kHz, -32dB/octave, Audio Frequency Filter 
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type 400) that would maximally differentiate the stimuli on 
the Resonance-Hiss dimension loadings were determined on the 
basis of spectrographic analysis and trial and error. The 
output of each spectrai band filter was fed into a dual 
Channel Frokjar-Jennson Intensity Meter, operating on an 
integration time base of 20 msec. This provided a smooth but 
sufficiently time sensitive intensity trace which was 
recorded On an Elma-Schonander four channel Mingograph at an 
Operating speed of 100 mm/sec. The intensity meter provides 
for either a linear or a logarithmic scale for registering 
intensity over an operating range of 50dB. The logarithmic 
setting has the effect of magnifying differences at the low 
intensity levels of registration at the expense of 
differences at higher intensity levels. For purposes of 
clear segmentation into consonantal and vowel portions of 
the syllables, a Duplex Oscillogram was also obtained. For 


the instrumental configuration see Figure 6.3 below. 


It was hypothesised that the stimulus loadings on the 
Resonance-Hiss dimension could be adequately approximated by 
some linear combination of the low and high frequency band 
output levels where the weightings of the two bands will be 


opposite in sign (see following equation): 


PRES = a(LP output) - D(BP output) + C 


least squares match with loadings on 
Resonant-Hiss dimension 

low frequency band output weighting 
high frequency band output weighting 
some arbitrary (uninterpreted) constant. 
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BR FILTER 
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£2G-° 6.35 instrumentation for Acoustic Analysis of 
"Resonant-Hiss" Dimension 


Regression analysis was used to determine the regression 
weights and constant in this equation which would optimally 
predict the Resonance-Hiss loadings. Four sets of predictor 
variables were tried based upon the idinear or the 
logarithmic intensity meter scale outputs and whether or not 
traces for the consonantal portion of the syllable was 
measured. Results for the four sets of predictor variables 
did not differ greatly (see Table 6.3, Appendix k for 
details). The peak amplitude measurements resulted in 
Slightly better predictions than the area function (total 
energy) measurements, and the log scale fared somewhat 
better than the linear scale. The normalized regression 
equation for the prediction of the Resonance-Hiss dimension 


loadings in terms of the band-pass, log.scale, peak 
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TABLE 6.3 


PHYSICAL PREDICTORS OF “"RESONANCE-HISS" 
DIMENSION LOADINGS 


ee a a ee a a a a ee ee 


MEASUREMENT SCALE MULTIPLE REGRESSION 


COEFICIENT 

ie icriteeu ou TIN. 84 4 CS 
AREA FUNCTION LOG. .85 
PEAK AMPLITUDE LIN. 89 
PEAK AMPLITUDE LOG. AY 
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amplitude (BPGP), and the low-pass, 1log.scale, peak 
amplitude (LPGP) was: 


PRES 


Note: This regression equation was derived from measurements 
of the oscillograph trace deflections from the baseline. The 
units are therefore arbitrary. They may be converted into dB 
ratings by means of the calibration curves for the two 
traces given in Appendix L. 

As the above equation indicates, the peak intensity level of 
the high frequency band is considerably more important for 


the prediction of the factor loadings than the low band 


peak. 


Although the Resonance-Hiss dimension loadings may be 
quite successfully predicted by the simple bandfilter 
function developed in this study, there is some doubt that 
this perceptual continuum is correctly characterized in 
terms of an energy by frequency-band analysis. For example, 
the bandfilter analysis fails to predict the high loading of 
/la/ on the "resonant" pole of the Resonance-Hiss continuum 
because it takes no account of the kind of energy present in 


the 3-5 kHz band. It fails to distinguish spectrally 
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coherent signals from those that lack organization in the 
frequency eRe Such organization could be due to the 
nature of the source signal (the spectral coherence of the 
glottal tone), or the "shaping" function of a resonator, or, 


conceivably, to both of these factors. 


With this in mind, as an alternative to using the low 
frequency voicing component (LPGP) to capture the Resonance 
pole of the Resonance-Hiss continuum, an attempt was made to 
quantify the notion of spectral organization, or "formant 
structure", Two trained phoneticians, experienced in 
spectrographic analysis of speech were asked to rate the 
spectrograms of the 12 /Ca/Y stimuli for the presence of 
"formant structure" on a four point scale ranging from 
"strongly apparent" to "no detectable formant structure", 
The two sets of independent ratings (inter-rater reliability 
estimated at rho = .86) were averaged and entered into a 
regression equation with the other predictor variable, the 
high-band peak amplitude, BPGP. This resulted in a multiple 
regresion coeffeiient of R = .92 for the Resonance-Hiss 
loadings (see Appendix K for detente Table 6.4 shows that 
the low-band peak amplitude (LPGP) and the average formant 
structure ratings (FRAV) are significantly intercorrelated, 
but both of these variables are sufficiently independent of 
the high-band peak amplitude (BPGP) to justify combining 
either FRAV or LPGP with BPGP to define a single, bipolar, 
Resonance-Hiss dimension. The notion of spectral coherence 


seems to play a significant role in the prediction of the 
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TABLE 6.4 


me es a ae ee 


em Ss a a ee ee we ee Se 


BPGP LPGP 
HIGH BAND PEAK AMPLITUDE BPGP 
LOW BAND PEAK AMPLITUDE LPGP -. 306 
MEAN FORMANT STRUCTURE RATING FRAV -.306 - 623 


ee es es ee ee ee a ee 


Resonant-Hiss dimension and, by implication, in the 
determination of perceptual structure. But because it could 
not be fully operationalized (i.e. instrumentally measured) 
this variable was not used as a physical predictor in the 


reconstruction of the perceptual configuration. 


Figure 6.4 indicates how well the two-dimensional 
Kruskal configuration for the stimuli may be predicted from 
the physical correlates of the two hypothetical perceptual 
dimensions - the bandfilter regression function, and the 
physical duration of the consonantal portion of the 
syllable. As previously mentioned, the two sets of 
interpoint distances in Figure 6.4 are quite highly 


correlated (r= .81). 


A recent experiment by Pols (1974) on the physical 
correlates of Dutch CVC syllables provides an interesting 
corroboration of the physical Resonance-Hiss dimension 
isolated in this study. He bandfiltered 270 spoken CVC 
syllables with aT parallel filters whose bandwidths 
approximately matched the frequency resolution of the human 


ear. The output intensity of each filter was sampled every 
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Arrows indicate magnitude of 
discrepency between perceptual 
@ . conFiquration(Kruskal scaling) 
i and conf iquration predicted on 
‘8 basis of physical duvation and 
bandFilter Function. 
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24 Reconstruction of two-dimensional 
perceptual configuration 
on basis of physical predictor variables 

10 msec. Each of the 14,111 resulting 10 msec. samples may 
be described aS a point in a 17 dimensional space, having 
co-ordinate values equal to the levels of the 17 filters. 
These variables were subjected to a Principal Components 
analysis. The first factor which emerged (accounting for 
55.1% of the total variance) was "a very efficient 
Giscriminator between sonorant and non-sonorant sounds." The 
nature of this factor may be illustrated by the graph of the 
first eigenvector in Figure 6.5. Peak. values in the 
eigenvector closely correspond with the optimal centre 


frequencies in the high and low frequency bands that were 
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matrix of bandfilter spectra (from Pols, 1974) 
\ 
used to predict the Resonant-Hiss loadings in the present 
study. Unlike the present study, Pols' speech samples were 
drawn from vowel as well aS consonantal segments of 
Syllables and this suggests a perceptual role for the 
Resonance-Hiss dimension that could only be a speculative 
extrapolation from the present data, but which Pols 
explicitly demonstrated by example (see Pols, 1974, p.90). 
Specifically, the Resonant-Hiss dimension proved to be a 
highly effective basis for vowel - consonant segmentation, 
which is often recognized in such diverse areas as automatic 
speech recognition and phonological theory as the most 


fundamental distinction in the segmental analysis of speech. 


Although the Resonant-Hiss dimension emerges strongly 


in both the scaling solutions and the regression analyses in 
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the present data, the status of the temporal dimension is 
more equivocal. It appears to account for a good deal less 
of the variance in the perceptual configuration and is not 


stable over a change in the "carrier" vowel. 


To show that the transposition of the sibilants on the 
second dimension of the perceptual configuratiion (Figure 
5.4) is consistent With the interpretation that it 
represents a "consonantal duration" or "abruptness of 
syllable onset" factor, it will be necessary to show that 
this change is a perceptually real phenomenon. A_ verbal 
rating scale experiment, similar to Experiment IV of the 
present study, but with /Ci/Y syllables, could provide the 
duration of the consonantal segments of the /Ci/ stimuli 
showed that fe/ was considerably longer in this than the 
{Caf set. However the correlation between the physical 
duration and loadings on the temporal dimension of the 
Kruskal scaling solution dropped from .95 for the /Ca/_ set 


to .75 for the /Ci/s set. 


One hypothesis that could explain the perturbation in 
the perceptual configuration with the change in carrier 
vowel is that the perceptual prominence of certain auditory 
features relevant to the recognition of the consonant are 
subject to differential backward masking effects by 
different vowels. This hypothesis gains plausibility when 


coupled with the suggestion that the two-dimensional 
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representation favoured by the analysis thus far may be an 
Cver-reduction of the proximity matrix. In other words, 
features that went undetected by the MDS analysis may have 
been differentially enhanced or suppressed in interaction 
with the carrier vowel. In response to this suggestion it 
Can only be observed that for reasons of mathematical 
determinancy and replicability of findings with MDS, it is 
advisable to keep the dimensionality of the solution low. In 
this way one may be assured of capturing at least the most 
important of the feature dimensions contained in the 
proximity matrix. 

Finaily, the possibiiity cannot be dismissed that the 
problematical loaditgs of the sibilants on the "temporal" 
Gimension for the /Ci/ set is an experimental artifact. It 
is apparent from the oscillograph tracings (Appendix J) that 
there is some temporal clipping of the carrier vowel in the 
longest syllables of the /Ci/ set. This occured because of 
time sample limitations in the gating program used for 
digitalized storage of the stimuli. The clipping was noticed 
at the time the stimuli were constructed but because it was 
barely detectable in playback, it was not judged to be a 
potentially significant influence on the subjects ratings. 
In retrospect this may have been a mistake. In any event, it 
clouds a potentially important point in the analysis of the 


data of Experiment II. 
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Broader Discussion of Findings 


Quite clearly it would be simplistic to suggest that 
the two perceptual dimensions identified in these 
experiments comprise the necessary and sufficient set of 
auditory features that listeners' employ for consonantal 
phonemic recognition. Perhaps the most obvious objection to 
this interpretation of the results of the present study is 
that the range of stimuli is too restrictive. The use of 
only 12 stimuli per scaling set quite severely restricts the 
upper limit on the number of readily interpretable and 
reliable dimensions that is likely obtainable. On the other 
hand, an effort was'‘made to broadly sample from the domain 
of auditory variability manifest in the consonantal sounds 
of English, so that those dimensions which are obtained 
should be the major ones and demonstrable in larger studies 
which more adequately represent that set. In this 
connection, the Similarity rating studies of Black (1968) 
and Singh, Woods, and Becker (1972) are important because of 


the large number of stimuli they employed. 


The plot of the first two principle components of 
Black's solution (Figure 3.5) shows that (within rotational 
invariance) his data agree quite well with the two 
dimensions isolated in the present study. Singh et al'.s 
(1972) results (Figure 3.6) are more problematical. The ABX 


condition, which comes closest to the triadic comparisons 
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scaling technique used in the present experiments, yields a 
two-dimensional configuration that generally matches 
expectations of the two-factor model. Clearly though, there 
is poor agreement between the Duration and Resonance-Hiss 
model and the results of Magnitude Estimation (ME) and 
seven-point (SF) scaling. Lack of taw data for the other 
reported studies of consonantal similarity rating (Peters, 
1963; Pruzansky, 1970) makes comparisons more difficult. But 
from the authors’ own reports (Chapter III, pages 56 and 
61), it seems that there is substantial agreement with the 
findings of the present study in obtaining a perceptual 
configuration where the consonants cluster by traditional 
"manner of articulation" groupings in a space definable by 
\ 


two orthogonal factors of sound duration and quality. 


In assessing the theoretical significance of the 
present findings, it may be crucially important to note the 
agreement between Graham and House's (1970) data (see Figure 
3.2) on perceptual confusions of young children and the 
perceptual structure predicted by the two-factor model. 
Admittedly, there are some discrepencies in the obtained 
perceptual configuration, but considering the nature of the 
data (and the fairly high "stress" rating) this is only to 
be expected. The Graham and House experiment is one of very 
few reported studies on the development of perceptual 
Capabilities for speech recognition at the phonological 
level. (Most developmental investigations - probably for 


methodological reasons - have concentrated upon the 
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acquisition of phonological contrasts in speech production.) 


Developmental data are, however, of vital theoretical 
interest because the order of acquisition of different 
phonemic contrasts can potentially provide important 
information about the perceptual processes underlying 
phonemic recognition in the “linguistically competent" 
adult. One may reasonably hypothesise that those phonemic 
contrasts which are mastered at a very early age, correspond 
with auditory distinctions that are most "natural" for the 
perceptual apparatus - discriminations made with high 
reliability without need of extensive "ear training". On the 
other hand, late emerging phonemic contrasts (of which 
Voicing has been Claimed to be one of the last: Shvachkin, 
1948, in Fergurson et al., 1973; also, Garnica, 1971) would 
presumably constitute "difficult cases" for the perceptual 
apparatus, requiring perhaps highly complex perceptual 
processing of the signai beyond some "primary auditory" 
level of neural representation. If this developmental 
hypothesis is correct, then the spatial structure of 
confusion matrices obtained from subjects whose perceptual 
capabilities for phonemic recognition are incompletely 
developed, should largely reflect those primary auditory 
dimensions that are most salient to the relatively “language 
naive" ear. Thus arguably, the agreement between the Graham 
and House data and the findings of the present study 
supports the hypothesis that the most important determinants 


of the perceptual configuration obtained in the present 
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study are not specificaily linguistic distinctive features, 
but perceptual dimensions that may subserve auditory 
recognition in general (Hypothesis II in Chapter i). This 
conclusion is also indicated by the notable failure of 
"abstract" phonological feature systems to contribute 
Significantly to the prediction of the interpoint distances 
or the proximity scores - beyond what might be anticipated 


from the two-factor model. 


It may be argued that the experimental resuits are not 
counterindicative of the existence of specifically 
linguistic feature detection in phonemic perception, but 
merely that such features fail to show up in the MDS of 
similarity judgements. It does in fact seem that a rather 
low order of perceptual processing is being tapped by these 
experiments. Literally interpreted, the two-factor model may 
be characterised as a device which reliably segments the 
acoustic signal into broad perceptual categories ~ 
sufficient to distinguish vocalic from consonantal segments, 
and within the latter class, to differentiate “manner of 
articulation" groupings - Stops, Resonants, Sibilants, (Soft 
Fricatives?). Thesé groupings roughly suggest the probable 
limits on the resolving power of the simple two-factor 
model. More complex perceptual decoding would seem to be 
required to Peper. the level of phonemic and phonetic 
resolution characteristic of the "perceptual competence" of 


the native listener. 
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It is not unreasonable to suggest that a good deal of 
perceptual learning (specific linguistic training of the 
auditory perceptual apparatus) is required before listeners 
can readily differentiate phonemic targets within the ma jor 
manner groupings yieided by these experiments. It is well 
known, for example, that the stop consonants require for 
their mutuai differentiation acoustic cues such as formant 
transitions which are context dependent and therefore 
require a complex mapping between acoustic signal and 
“perceptual target that some writers (Liberman et al., 1967) 
have labeled "encoded", These same kinds of cues apparently 
piay a significant, though progressively less important role 
for mutual discrimination within the resonant and fricative 
consonantal sub-groups. Correspondingly, dichotic listening 
studies show a Significant but progressively decreasing 
right ear (left hemisphere) effect for natural (or 
synthetic) stop consonants (Shankweiler & Studdert-Kennedy, 
1967; Studdert-Kennedy & Shankweiler, 1970) resonants 


(Haggard, 1971), and fricatives (Darwin, 1971). 


Interestigly, no dichotic listening studies have 
reported lateralization effects for selected sets of stimuli 
drawn fron across rather than within the perceptual 
groupings found in the present experiments. It would be 
predicted that for such stimulus sets no significant 
lateralization effect would be found. Even within these 


perceptual clusters, the relative strength of the 


lateralization effect for particular phonemic targets may be 
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predictable. Note for instance that /t/ is separated quite 
distinctly from the other stops /p,b,d/ on the Resonant-Hiss 
dimension, presumably by its stronger "hard aspiration" (see 
Figures 5.3, 5.4). It may therefore be eapeeted to yield a 
correspondingly weaker lateralization effect than the other 
stop consonants. Studdert-Kennedy and Shankweiler (1970) 
found in fact that of the six stop consonants /p,b,t,d,k,g/, 
/t/ ranked lowest in terms of lateralization effect under 
dichotic listening. These results do not imply that the left 
hemisphere is uniguely specialized for the detection of 


certain kinds of phonetic or phonemic features. 


Recent experiments (Carmon and Natchson, 1973; Papcun, 
Krashen, Terbeeck, _ Remington, and Harshman, 1974) have 
obtained right ear superiority for the dichotic perception 
of clearly non-speech stimuli and, on balance, the evidence 
Suggests that a general facility with the extraction of 
temporal sequencing may be important for explaining the 
lateralization of certain kinds of speech sounds, rather 
than some speech or language specific perceptual capability. 
Whatever the nature of the relevant differential hemispheric 
capability may be, the fact that a perceptual learning 
factor is important is strongly indicated. (Compare the 
performance of novice vs. experienced Morse code operators 


in the dichotic perception of Morse code signals in Papcun 


et al., 1974.) 


In short, the “distinctive feature" contrasts which 
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produce lateralization effects in dichotic listening appear 
to be those for which the auditory system requires special 
adaptation, which is presumably obtained through perceptual 
learning at some early stage of language acquisition. These, 
however, are not the prominant perceptual contrasts that 
emerged in this study. On the contrary, their salience was 
weak to the point of undetectability by the analytical 


methods employed in this study. 
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CHAPTER Vil 


SUMMARY AND SOME SUGGESTIONS 
FOR FURTHER RESEARCH 


The experiments reported in this paper, and a review of 
the relevant scaling literature, suggest that a small number 
of perceptual dimensions (two or three) are of paramount 
Significance for the recognition of consonantal sounds 
embedded in an isolated monosyllabic frame. The strongest 
and most easily replicable dimension, which was labelled 
Resonance-Hiss, is conceivably employed, not simply for 
broadly differentiating the consonants as perceptual 
targets, but for providing the acoustic basis for 
segmentation of the Signal into consonantal and vocalic 
frames - an operation which would likely provide essential 
information for the functioning of higher-order stages of 
perceptual-linguistic processing. Although the loadings of 
the sounds on the Resonant-Hiss axis of the MDS solution 
were fairly accurately predictable on the basis of a simple 
bandfilter analysis of the stimuli, it would seem to be an 
oversimpification to characterise this dimension, in 
acoustic terms, as a “low-to-high" frequency continuum. The 
notion of "degree of spectral coherence" was introduced to 
make provision for the role that the resonating cavity, 
coupled with the source signal, appears to play in locating 


sounds along this continuum. 


A second dimension identified as a temporal factor was 
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found to be replicable in the present experiments, and was 
discernible also in other studies employing similarity 
scaling (Peters, 1963; Black, 1968; Pruzansky, 1971; Singh, 
Woods, 6& Becker, 1972) as well as a study of perceptual 
confusions of young children obtained under non-noisy 
listening conditions (Graham 6& House, 1971). Loadings on 
this dimension correlated highly with the duration of the 


consonantal segments of the test syllables. 


A third, Voicing dimension, was apparent in the results 
of Experiment I, but was not clearly discernible in 
subsequent experiments. It has often been claimed, on the 
basis of experiments with white-noise masking, that Voicing 
is the most salient’ distinctive feature contrast amongst the 
consonants. However, in noting the strength of the Voicing 
Gimension commentators (such as Shepard, 1972; or Studdert- 
Kennedy & Shankweiler, 1970) have tended to overlook the 
specific experimental conditions that resulted in the 
preservation of auditory information in only the lowest 
frequency band. Similarity rating studies, in the absence of 
high frequency masking, eaeceae that Voicing is not a strong 


perceptual dimension for phonetically untrained listeners. 


This finding is of general interest for the study of 
perceptual processes underlying speech recognition at the 
phonological level because it points to an imbalance of 
focus in current rear atical discussions which this paper 


may help to redress. A great deal of research effort has 
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been directed to the study of phonetic contrasts such as 
Voicing and Place of Articulation for which the recognition 
problem stated in terms of a mapping between reliable 
acoustic cues in the signal and the invariant perceptual 
target, is known to be quite complex. It is a moot point 
whether these kinds of fine perceptual discriminations 
require the postulation of a genetically "pre-wired" 
phonetic feature detection capability of the human brain (as 
argued in a recent review article by Cutting and Eimas, 
1974). This author is inclined to the view that current 
evidence for such a "nativist" position is highly equivocal 
and that the adaptability of the auditory-perceptual system, 
in conjunction with a learning process directed by the 
Phonological exigencies of the hearer's native language, 
provides a sufficient schema for the experimental data 
presently at hand. Resolution of this debate will only be 
possible when a great deal more reliable information is 
obtained about the developmental timetable for the 
acquisition of linguistically relevant sound contrasts, and 
when basic processes of auditory discrimination and 


recognition are better understood than at the present time. 


Setting aside the complexities of the “nativist- 
empiricist" debate in relation to speech perception, there 
remains the broader and feasibly-answerable question of 
whether a specifically "phonetic" or "phonological" level of 
perceptual processing is clearly discernible in "phonemic 


recognition", as the term has been used in this-report. The 
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alternative view, which appears to be favoured by the 
results of the present experiments, is that subjects! 
responses (as manifest, for example, in similarity 
judgements to simple CV stimuli) are most readily 
explicable, not in terms of the posession of a set of 
specifically linguistic feature detectors, but in terms of 
features that reflect plausible response parameters of 
Mammalian auditory systems in general, and the human 
auditory system in particular. The Resonant-Hiss factor 
which appeared to be the most important determinant of the 
derived perceptual configurations in the present experiments 
is, arguably, not a specifically linguistic dimension, but a 
general auditory continuum that subjectively separates 
\ 
"resonant", "musical", and "pleasant" sounds from those that 
are "noisy", harsh, and "unpleasant". Dimensions of 
auditory discrimination (such as Place of Articulation, or 
Voicing, particularly in stop consonants) for which special 
linguistic adaptation of the auditory mechanism seems to be 
necessary (either through learning or heredity, or _ both) 


appear to play a secondary role in phonemic recognition. 


The imbalance referred to earlier, which lays undue 
stress upon the unique character of speech recognition vis a 
vis other forms of auditory perception, would seem to result 
from an over-concentration of research attention upon a 
highly restricted segment of the auditory domain that is 
generally utilized by listeners in phonemic recognition. Of 


course, the question of phonological factors in phonemic 


ee ke ee a ed : 

bu) = seven 9 i0tsht ose Pe 

ee senoqees ~ wt | taunneay acai of ; 
aa Sep. to Pent es qrosteire’ 

= ti hodat we eelieierer eer camel ag 


ee 


4s easel: Grannies cians, 


‘) @htdar pe 


whe si¢ apo?<q "igor "tape Lh bs <n 


rab ez i ema 


(Pamyees?  .°s ee aulagur hen. i = 


ewes hinaherd $6 vome: mn dine) (a , (4 20daRm 
. EF dg Ween Tt (ae 
etwas waday, sl tesnaneaie Wr eh ae RC a es 
3 : on vin mj 
oi 47 -dleee Geen choee ror plipe “ ib wattarpaba abvabin 


ized ie ark bared peck daps euonest meres @ 
in et os 3 _ ‘as alos, ante: m yoky oF pa 

. ae -eiel yer in~ 43 ip 2 Ly we | ! nce, soasteted | ‘edt ala 7 
3 tas abs Rhdads scitaial To pPbeniea phd ee nage baclat 
tights: oF mee, ib G21 suo haquiten yrtie ore 3p, 6202 xahto Se x 
& /aeto” aes faigs 2 & dpregoe” Roy fghtovtres rev ile moat - ‘ 
2 Se? ochre vooebes ahs am at te toayiee ad yldped a8) 
‘¢ 


Megan ss ii siaeeiiies ne ee 1 Somes aoe, piteseneg 
area aL: 9 bebve aay Bo A ESROIQ ALT | yRetHOD « ee 


< nis me Je ies geatae tad ean 
—_ a hae : : 
, = a= or : al iT wr A in ie ~ rae 


152 


recognition has only just begun to be raised in an 
interesting way by experimental studies of speech perception 
and the methodology of MDS has yet to be fully exploited. 
Terbeek and Harshman (1971) have offered some highly 
suggestive evidence in a cross-language study of vowel 
perception, that language-specific, phonological factors, 
play an important role in the structure of the perceptual 
Space. An investigation, Similar to theirs, of consonantal 


perception should prove interesting. 


In the course of the present investigation a 
preliminary but unsuccessful attempt was made to test the 
“"speech-mode" hypothesis (Liberman et al., 1967) in the 
context of the MDS paradigm. It was hoped that by gating out 
the steady state vocalic portions of the CV stimuli, and 
replacing them with a periodic, synthetic, "buzz" of roughly 
the same fundamental frequency, intensity, and duration as 
the replaced vowel, it would be possible to generate a set 
of stimuli that subjects would hear as "non-speech" sounds, 
yet with the essential acoustic cues for the recognition of 
the initial consonant preserved. Unfortunately, the 
phonetically untrained ear is not so easily fooled by such 
acoustic conjury (for which Roszypal's elegant PDP-12 gating 
program is in no way to blame). With repeated stimulus 
presentations that are necessitated by the Triadic 
Comparisons method, most of the supposedly "non-speech" 
stimuli became readily recognizable, "funny speech" sounds 


produced, as one subject put it, by "some sleepy dragon". 
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A nore promising approach (which time, and some 
technical problems with the speech synthesiser prevented the 
writer from exploring sufficiently for this report) to 
testing, more strongly, the validity of the interpretation 
assigned to the MDS results of the present study, involves 
the use of purely synthetic stimuli. If the two major 
acoustic parameters isolated in these experiments are in 
fact the variables largely responsible for the shape of the 
derived MDS configuration, then, degrading the original set 
of stimuli in such a way as to preserve only the variation 
on these two acoustic variables should not substantially 
alter the derived perceptual configuration. 


( 
TO preserve the temporal dimension, (Consonantal 


Duration or Abruptness of Syllable Onset) the temporal 
envelope of the stimulus is required. For variation in the 
Resonant-Hiss dimension, the bandfilter intensity functions 
(one for the high and the other for the low frequency band) 
that were obtained from the acoustic analysis of the 
original set of scaling stimuli (see Figure 6.3) may be used 
to control the Hiss Amplitude and the Voice Amplitude 
parameters of the PAT speech synthesiser. In this manner a 
Wrecons vented set of scaling stimuli could be obtained 
that would match the original scaling set of CV syllables 
just with respect to those acoustic variables thought to be 
responsible for the basic shape of the derived perceptual 


configuration. MDS of this "new" set of stimuli should yield 
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a configuration in substantial agreement with the old - if 
the hypothesised basis for the subjects! Similarity 


judgements is correct. 


More generally, MDS experiments with synthetic auditory 
stimuli are needed to test the validity of some of the 
parametric assumptions of the MDS model itself in the 
context of auditory perception. Until this is done, the 
potential utility of MDS to problems of speech and auditory 


perception will remain in some doubt. 
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APPENDIX A 
DISTINCTIVE FFATURE DEFINITIONS 


Anterior: “sounds are produced with an obstruction that is 
located in front of the palatoalveolar region of the 
mouth [Chomsky and Halle, 1967, 304}. This feature, 
stricktly speaking, applies only to consonantal sounds. 


Compact (vs. Diffuse): ‘Compact phonemes are characterized 
by the relative predominance of one centrally located 
formant region ([Jakobson, Fant, and Halle (hereafter, 
JFH), 1951, 27]." This feature co-classifies consonantal 
sounds made with a constriction in the posterior portion 
of the oral cavity (vwelars, palatals) and open vowels. 


Continuant: If air-flow through the mouth is not blocked 
during production of a sound then such a sound is 
labelled continuant. The liquids [1] and [r] are 
difficult to classify on this dichotomous scale. 


Consonantal: JFH define this feature in acoustic terms "by 
the presence of zeros that affect the entire spectrum 
{[p.19}]." C&H define it articulatorily as those sounds 
“produced in the midsaggital region of the [oral] cavity 
{p.302]." In either case, the close parallel with the 
feature Vocalic is obvious and it is doubtful whether 
these two features represent distinct dimensions for the 
native speaker or listener. 


Coronal: Alli sounds produced with the blade of the tongue 
raised above the neutral position are labelled 
"coronal", This feature distinguishes dentals, alveolars 
and alveopalatal sounds from labials, velars, palatals, 
and pharyngeals. it Ls, stricktly speaking, a 
consonantal feature. 


Frication: This feature characterizes all sounds with a non- 
plosive turbulent noise component. 


Grave (vs. Acute): "This feature means the predominance of 
one side of the significant part of the spectrum over 
the other. When the lower side of the spectrum 
predominates, we term the phoneme acute [JFH, p.29]." 
This feature co-classifies labial and velar consonants 
and vowels with a high second formant. JFH admit that a 
complex normalization of the signal would be necessary 
to achieve automatic separation of speech sounds on this 
hypothetical dimension. 


High: sounds are produced with the tongue body elevated 
above the "neutral" position. Velar and palatal 
consonants are regarded as Ragu’, as are the 
traditional high vowels. 


a yay 
ae) oe 


PY ws Tair ; 


=e, 
et t's 20ne! Ae ap ay 
. -ot'lass 
4 i 
, rr “ee ne 
*(, 2% , 24 @ esha sage . or 


BD) . eH). rs 3 | 
? cum > Fete hae é: neta ‘eas ya 
raat! Pike yritet a my err 7 fares ; 


iz “wt foe sae 7 oie § 
2 bes aing pert} yon 


+9 if ot yi ends voih ae "ee 

Pant ‘any Paulos th sor : 
r nytt ae saenditenn 
‘OSS Abe #9 amine i 


ja ruth 


A > Dia tta>4a > “at? Foard. kar 
Peas >’: 2 ee sent Breath i, 
7 OAT Shae 


Sle tal +) 20) Ge ines ts Be a at ry 
“ep oe 1 aged ae pee ada ae Au + 
x oly Wisanegh sy 27 fie a 


-FiN4glh taal seen hes: 


Kraehe hac 


extant oly nia’ ete ee 


Lcmtvt 7 Ree og Ree a 

slonale ,*ibsieb eed chy RRee Meera: 762” 

(Vinay 34059 .K CSG RE Oe ae 8 baa, 

' pe canes et Snisee al at Zz hae 55 
~ Ps U 


. ans ows de age save 
aon 6, AW edawe Lae s\eiasite, 2 dees ‘ect igen 
PEPPY szbia nv Ee troy ed 


¥e. <: nf rceubs ae i< & & iatu ‘pada a 2g" . (222A m 
Fave. ar iriee § ant) 1h #9eq: Shintreaple ods do Shbs ono 
eve? IPs. gas tar ehie } tavil age tags -tadto 7 eas : 

* Ou PG): sinhe ‘“enegosa ak wre) ee ius 
asgascetn’> isiay’ line lasdgd' Rode tee i. Ssp7eet test 
Ss «foc? a nit eee) 6aptes pee © &72u elovov Bits | i: 


VROT wee ot Lane fotki eae” Sitesiisebns rns gall Re 
Pads wT 2 ders ¢ Week Tr ny et a Bitnnorve weldoa ‘os - anit 
, sult Egaloesrogya nm aM 

zevate —. siignbe: TES" Ala Shed o2h nomen tapia,” (“4 
he Price hie” thre igttiney Mist: an ede eH OR. me 


ait ).30e 8 02h. 4 Mibehae as WAGEnas sts aswhida ads: - 
: ee ; ialewoy did Lpantsihext 
7 - : ” Pa ms = if 


af Ey) . te eee 
, ' 6 : =. i ! 1 - 
a we f c oe tie 7 aa > 7 —_ ee = a al 


163 


Low: sounds are produced with the body of the tongue below 
the neutral position. The traditional low vowels and 
pharyngeal and giottal consonants are regarded as "low". 


Nasal: sounds are characterized by a lowered velum, with or 
without closure of the oral cavity. The acoustic 
coupling of the nasal cavity introduces additional poles 
and zeros into the supraglottal transfer function. 


Place (SB): a four valued categorical place of articulation 
feature used by Singh and Black (1968), classifying 
sounds into (1)labials, (2)dentals and alveolars, 
(3) palatals, (4)velars. 


Sibilant: sibilants are sounds posessing the greatest amount 
of turbulent noise. This feature separates the alveolar 
and alveopalatal fricatives from the softer labiodental 
fricatives and all other noise-weak sounds. 


Strident: "sounds are marked by greater noisiness than their 
non-strident counterparts [C&H, p.329]." For C&H the 
relatively weak frication in the English labiodental 
fricatives is sufficiently strong to be "strident". JFH 
on the other hand regard the labiodental fricatives as 
non-strident. 


Tense (vs. Lax): "Tense sounds are produced with a 
deliberate, accurate, maximally distinct gesture that 
involves considerable muscular effort [C6éH,p.324]." This 
feature differentiates voiceless from voiced consonants 
and vowels according to their degree of constriction 
(amount of movement of the tongue body from the neutral 
position). 


Voice: The voicing feature has been variously defined in 
articulatory and acoustic terms. Most simply it is 
characterized by the presence of glottal activity up to 
the point of maximal constriction in the articulation of 
the sound. The presence of a spectral voice bar and 
voice onset time (VOT) are often treated as defining 
Characteristics of this feature. With respect to the 
consonants, this feature is co-extensive with the tense- 
lax distinction. 


Vocalic: Vocalic sounds posess a “single periodic voice 
source whose onset is not abrupt [JFH, p.18]." This 
feature is used to distinguish vowel and vowel-like 
sounds from consonantal type sounds. Cé&éH characterize 
this feature in terms of degree of constriction of the 
oral cavity. Formant structure is an important 
accompanying, but not defining, characteristic of this 
feature. 
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COORDINATES FOR 3-DIMENSIONAL 
STRESS 


pa 
ba 
ta 
da 
ca 
sa 
sa 
ha 
za 
ma 
na 
la 


COORDINATES FOR 2-DIMENSIONAL 
SLERESS 


pa 
ba 
ta 
da 
ca 
sa 
sa 
ha 
za 
ma 
na 
la 


APPENDIX B 


Di 
0.197 
0.051 

=0. 789 
-0.092 
=). 776 
-0.458 
=O 17 
0.315 
=WerO 2 
0.836 
0.740 
1.104 


DI 
=0.932 
-0.754 
-0.734 
=O 715 

0.331 
1.007 
0.811 
-0.047 
15170 
=O. J09 
-0.085 
0.107 


= .05 


DIT 
0.548 
0.793 

-0.080 
1.127 
-0.809 
-0.776 
-1.022 
-0.286 
=0..392 
0.485 
0.502 
0.134 


= .10 


DIT 
0.131 
0.291 

-0.800 
0.549 
-1.041 
-0.703 
-0.883 
0.064 
-0.456 
0.987 
0.962 
0.897 


KRUSKAL SCALING EXPERIMENT I 


SOLUTION 


DEL 
0.841 
0.405 
0.817 
-0.034 

0.026 
=0'5790 
-0.366 

0.383 
=O0.0919 
-0.098 
-9.204 
-0.029 
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APPENDIX D 
KRUSKAL SCALING EXPERIMENT IT 


/Ca/ set 


COORDINATES FOR 
3-DIMENSIONAL SOLUTION 


COORDINATES FOR 
2-DIMENSIONAL SOLUTION 


239.23 37.122 RC=0 


STRESS = .08 STRESS = .14 
pi Dist DIITI Di Dit 
sa 0.854 0.561 -0.476 sa 1.017 0.648 
pa -0.480 -0.731 -0. 281 pa -0.731 -0.431 
a 1.9034 -0.443 0.182 éa 1.079 -0.475 
ma -0.929 0.210 -0. 245 ma -0.877 0.250 
ta 0.100 -0.646 0.795 ta 105167 -12100 
la -0.609 0.572 -0.164 la -0.842 0.467 
da -0.238 -0.144 0.758 da -0.086 -0.749 
ha -0.338 -0.142 -0. 663 ha -0.468 0.260 
Sa 0.974 -0.027 -0.631 Sar "et48 § 05276 
za 0.998 0.833 0.113 za 0.958 0.939 
na -0.576 0.448 0.390 na -0.669 0.548 
ba -0.791 -0.446 0.223 ba -0.646 -0.573 
{Cis set 

STRESS = .06 STRESS = .09 
DI Dir. “Dare DI DII 

si 0.877 -0.437 0.243 Si 101 664s 
i -0.659 -0.530 -0.174 pi -0.731 -0.431 
Ci 1.141 -0.075 -0.613 Ci 1.079 -0.475 
mi -0.475 0.895 0.230 mi -0.877 0.250 
ti -0.476 -0.898 0.200 ti 0.147 -1.100 
li -0.563 0.608 -0.365 1i =048h2 0.250 
di -0.545 -0.545 0.177 di -0.086 -0.749 
hi -0.159 0.6250 -0..557 hi -0.468 0.260 
si 1.105 02245°-0.532 ei 1148 05216 
Zi 1.939 -0.056 0.408 zi 0.958 0.939 
ni -0.573 0.630 0.367 ni -0.669 0.548 
bi -0.708 -0.087 0.617 bi -0.646 -0.573 
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(CONTINUED) 
PROJECTION OF STIMULI IN 4 DIMENSIONS 
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APPENDIX E 


PROJECTION OF STIMULI IN 4 DIMENSIONS 


I 
-025762 
0.9162 
-0.4630 
=0.0719 
-0.3704 
-0.4307 
=0 3691 
=-0. 3696 


0.5775 | 


0.8716 
-0.3168 
0.6023 


(CONTINUED) 


A 
=O5 1708 
0.2562 
=0.0357 
0.1084 
0.1028 
-0.1413 
0.0262 
0.1241 
-0.4325 
0.4279 
0.2956 
-0.. 5627 


IV 
0.3966 
0.1948 

=0:.1521 
-0.2553 
=O. 3236 
0.0649 
0.2055 
-0.0052 
-0.0816 
0.0268 
-0.0106 
-0.0603 
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APPENDIX F CONTINUED 


UNROTATED PRINCIPAL AXES FACTORS 


(Caf SET-EXPERIMENT 11 


COMMUNALI TIES I iy 
sa 0.944 -0.792 0.485 
pa 0.981 Os716 --0. 086 
ca 0.939 -0.631 -0.420 
ma 0.934 0.856 0.380 
ta Of855 0.459 -0.781 
la 0.831 0.690 0.449 
da 0.704 O2672, -O.317 

_ha 0.959 0.657 0.389 
sa 05877 -0.809 0.154 
za 0.971 -0,706 0.402 
na 0.926 02756 07399 
ba 0.912 OL873 » -0.037 


6.326 1.973 


PERCENT OF COMMON VARIANCE 
58.998 | 18.216 


PERCENT OF TOTAL VARIANCE 
90. 267 $2,714 168465 


ILI 


-O.157 
Ono 1S 
0.273 
0,024 

-0.174 

-0.184 

-0.387 
0.600 
0.426 

-0.462 

-0.173 

-0.137 


1.384 


12.747 


11.530 


VARIMAX ROTATED FACTORS 


ut pis 
sa -0.818 -0.318 
pa 0.294 503009 
ca OLOTSt, =Oas rt 
ma Out 789 06831 
ta 0.905 -0.149 
la 0.008 0.874 
da 0.699 ORS 
ha -0,098 0.400 
sa -0,626 -0.386 
za =09653* =0.300 
na Ooi 720 0.921 
ba 0.479 0.295 


PERCENT OF COMMON VARIANCE 
29.455 28.690 


TG} 


-0.414 
0.872 
SORT 
0.383 
0.032 
0.108 
-0.045 
0.869 
-0.035 
-0.620 
0.130 
0.374 


22.499 


IV 


-0.239 
-0.443 
0.538 
OnZ39 
-0.061 
0.346 
0.036 
-0,123 
0.131 
-0.312 
0.407 
-0.359 
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10.611 


9.578 


IV 


0.056 
0.366 
-0.891 
OPA SS) 
0.108 
Oe2L8 
0.338 
0.187 
=Oe79 
0.263 
0.180 
OF675 


19.356 
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APPENDIX F CONTINUED 


UNROTATED PRINCIPAL AXES FACTORS 


/Ci/ SET EXPERIMENT II 


COMMUNALITIES I 
sa 0.915 -0.667 
pa 0.792 0.694 
ca 06958 -0.825 
ma 0.960 On#s8 
ta 0.843 0.604 
la 0.850 On7Lg 
da 0.926 0.682 
ha 0.907 e177 
sa 0.908 -0.866 
za 0.934 -0.617 
na 0.961 On7 LE 
ba 0.842 0.825 

6.020 


ide 


O:.313 
0.473 
=O-030 
=() 596 
0.680 
-0.443 
0.674 
-0.341 
=O.j232 
0.087 
Wi 2 
Os2)7/ 


2:. 365 


PERCENT OF COMMON VARIANCE 


55.. 788 


21.920 


PERCENT OF TOTAL VARIANCE 


sa 
pa 
ca 
ma 
ta 
la 
da 
ha 
sa 
za 
na 
ba 


892918 50.164 


T9310 


Tt Pt 


0.583 
-0.284 
AOI 521 

Cee 39 


“-0.019 


0.028 
-0.004 
=O. 124 
=O Sie 

0.720 

0.262 

0.106 


1.430 


BE PAS) 8 


i949 


VARIMAX ROTATED FACTORS 


al 
-0.116 
0.786 
-0.569 
-0.018 
soe? 
0.137 
0.954 
0.090 


~0.740° 


-0.253 
-0.018 
0.649 


3.796 


| 
i 


Am | 
age EAS 
0.022 
=-0'.. 762 
O.892 
-0.019 
0.552 
0.057 
0.149 
~0.578 
-0.110 
02929 
0.565 


3.343 


III 
=O. o7 
0.394 
O27 
6.350 
0.075 
0.239 
0.108 
0.147 
-0.010 
=0.917 
0.226 
0.280 


2eLL9 


3INY/ 


G.180 
0.073 
0.011 
=O 224 
0.128 
0.368 
0.078 
0.740 
=—07,012 
O° 165 
-0.189 
=O 4 / 


0.975 


9.041 


8.129 
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APPENDIX G 


UNROTATED PRINCIPAL AXES FACTORS 
EXPERIMENT IV 


COMMUNALITIES I ep 
pa 0.850 OFS87 = OG 251: 
ma OF0 Sil -0,060 0.974 
ca 0.865 -0.139 -0.919 
ba 05796 0.874 ORS Z 
sa 0.933 -0.965 -0,046 
ta 0.808 02715; 60,045 
na 0.934 0.031 7 10,966 
za 0.891 -0.752 -0.5/70 
la On aeel -0,046 0.877 
ha 0.416 -0.242 0.598 
sa 0.658 -0.752 -0.304 
da 0,938 0.954 -0.170 


Se119 4,694 


PERCENT QF COMMON VARIANCE 
52.166 47.834 


PERCENT OF TOTAL VARIANCE 
81.776: 42, O59 Poe Ly 


VARIMAX ROTATED FACTORS 


pa 0.872 -0.301 
ma -0,005 ORSTS 
ca -0.191 -0.910 
ba 0.883 On 32 
sa -0.966 0,009 
ta 0.683 -0.584 
na 0.086 0.963 
za =O,754 90.520 
la 0.004 0.878 
ha -0, 208 OF621: 
sa -0.768 -0,260 
da 0.942 -0.224 


PERCENT OF COMMON VARIANCE 
52.152 47.848 


PHONOLOGICAL FEATURES: 


PREDICTORS: 
CRITERION: 


FEATURES 
Strident-M. 


Consonantal-V. 
Continuant-I. 


Grave-Acute 


Compact-Diffuse 


Tense-Lax 


Nasal-Non nasal 


PREDICTORS : 
CRITERION: 


FEATURES 
Strident-M. 
Tense-Lax 


Continuant-I. 


Grave-Acute 


Nasal-Non nasal 
Compact-Diffuse 
Consonantal-V. 


Vocalic-N. 


PREDICTORS: 
CRITERION: 


FEATURES 
Strident 
Low 
Continuant 
Coronal 
Voiced-Vcls. 
Anterior 
Nasal 


PREDICTORS: 
CRITERION: 


FEATURES 
Strident 
Voiced-Vcls. 
Coronal 
Continuant 
Low 
Anterior 
Nasal 
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APPENDIX H 
REGRESSION ANALYSES 


Jakobson, Fant, and Halle features 
Kruskal interpoint distances 


MULTIPLE R RK SQUARE: RSO ‘CHANGE: BETA 
0.780 0.608 0.608 Os%or 
0.789 0.622 0.014 -0.097 
0.799 0.639 Us0L7 0. L3t 
0.801 0.642 0.003 0.055 
0.803 0.644 0.002 -0.062 
0.3805 On647 0.003 O'.. 053 
0. S05 0.647 0.001 0.024 

Jakobson, Fant, and Halle features 
Raw proximity scores 

MULTIPLE R R SQUARE RSQ CHANGE BETA 
Q. 725 0.521 O25 EL 0.676 
0.743 0.552 0.041 O.L9F 
0.763 07.562 0.030 Osh y 1. 
O27 78 02597. 0.015 Oxi 
UPR TAy ts) 0.602 0.004 0.076 
ORT TT 0.604 0002 0.088 
0.780 0.608 0.004 -0.068 

07. 6ae): 0.002 O.1052 


0.781 


Chomsky and Halle features 
Kruskal interpoint distances 


MULTIPLE R R SQUARE RSQ CHANGE BETA 


0.780 0.608 0.608 0.745 
0.789 07624 0.014 =O.i34 
O77 99 0.639 0.017 0.145 
0.805 0.649 0.010 0.098 
0.806 02651 0.003 0.058 
0.808 03653 0.002 -0.047 
0.809 0.654 0.001 0.026 


Chomsky and Halle features 
Raw proximity scores 


MULTIPLE R R SQUARE RSQ CHANGE BETA 


ORF 7 LS 07511 Onoutt 0.637 
0.743 On O02 0.041 0.210 
0.767 O75 569 0.036 Oriao 
0.789 0.624 O..035 0.206 
0.798 0.637 C.013 =O 160 
.0.804 0.647 0.010 ie zd, 
0.807 G.652 0.004 O07 7 
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PREDICTORS: 
CRITERION: 


FEATURES 
Duration 
Frication 
Nasal 
Place 
Voice-Vcls. 
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APPENDIX H (CONTINUED) 


Singh and Black features 
Kruskal interpoint distances 


MULTIPLE R R SQUARE. RSQ CHANGE BETA 
0.998 02350 0.358 0.480 
0.674 0.454 0.096 0.336 
0.696 0.485 0.031 0.264 
O.721 02520 0.034 =O 7205 
eee 0.520 0.000 0.016 


Singh and Black features 


PREDICTORS : 
CRITERION: Raw proximity scores 

FEATURES MULTIPLE R R SQUARE RSQ CHANGE BETA 
Frication 02552 O2305 0.305 Or 356 
Duration C2635 02403 0.098 0.400 
Nasal 0.660 02435 0.032 0.249 
Voiced-Vcls. 0.684 0.468 0.032 0.180 


PREDICTORS : 
CRITERION: 


FEATURES 
Duration 


Resonance-Hiss 


Nasal 
Voiced-Vcls. 
Place 


PREDICTORS: 
CRITERION: 


FEATURES 
Sibilance 
Resonance 
Stop 
Voiced-Vcls. 
Nasal 


Ingram features (set #1) 
Kruskal interpoint distances 


MULTIPLE R R SQUARE RSQ CHANGE BETA 


0.640 0.409 0.409 0.548 
0.841 0.708 0.298 02 695 
05959 0.739 0.031 = Orne Oe 
0.861 0.741 0.001 -0,051 
0.862 0.743 0.001 0.043 


Ingram features (set #2) 
Kruskal interpoint distances 


MULTIPLE R R SQUARE RSQ CHANGE BETA 


O. 779 0.608 0.608 02752 
0.822 0.676 OF 067 0.266 
0.859 02738 00 a2 0.249 
0.860 0.740 0.002 0.048 
0.860 0.741 02001 =) 5035 
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PREDICTORS: 
CRITERION: 


SCALE 
Hissy 
Abrupt 
Vowel-like 
Drstinct 
Harsh 
Melodious 
Loud 
Short-Long 
Clear 


PREDICTORS: 
CRITERION: 


SCALE 
Hissy 
Abrupt 
Harsh 
Distinct 
High-Pitch 
Vowel-like 
Even 
Short-Long 


APPENDIX I 
RATING SCALES: REGRESSION ANALYSIS 


Verbal rating scales of sound quality 
Kruskal interpoint distances 


MULTIPLE R R SQUARE RSQ CHANGE 


0.762 
0.846 
0.882 
0.903 
0.906 
0.914 
O.. 9:35] 
O:39)087 
0.939 


0.581 
On eEDS 
0.778 
Or.84'5 
0.822 
0.836 
0.840 
0.841 
0.845 


O2oo ] 
0.305 
0.063 
0.036 
0.006 
0.014 
0.003 
02001 
0.003 


BETA 
Oe 6 
0. 286 
0.335 
0.3226 
0.255 
=O). Zee] 
=O Onc 
Ons6 
= 0% 230 


Verbal rating scales of sound quality 
Raw proximity scores 


MULTIPLE R R SQUARE RSQ CHANGE 


0.697 
0. 75a 
0.780 
02792 
0.803 
0.809 
0.812 
0.813 


0.486 
0... 573 
0.608 
0.628 
0.646 
05655 
02659 
0.660 


0.486 
0.086 
0: 035 
0.020 
0.018 
0.010 
0.003 
0.002 


BETA 
O6887 
0.226 
0.092 
On ES2 
0.256 
0. £23 
0.087 
O07 7 
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APPENDIX K 


ACOUSTIC PROPERTIES: REGRESSION ANALYSES 


PREDICTORS: 


CRITERION: 


VARIABLE 
BPGP 
LPGP 


PREDICTOR: 
CRITERION: 


VARIABLE 


CDUR 


BPGP - High (band-pass) filter 
peak:amplitude, log. scale. 

LPGP - Low-pass filter peak amplitude 
log. scale. 

Loadings on Kruskal "Resonance - Hiss" 
factor. 


MULTIPLE R R SQUARE RSQ CHANGE BETA 
0.857 0.734 0.734 =O: 755 
0. 9a:3 0.834 0.100 023335 


CDUR - Duration of consonant 
Loadings on Kruskal "Duration" factor 


SIMPLE RR SQUARE B BETA 
0.954 0.911 Besse m80 954 
CONSTANT = -12.995 
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APPENDIX L 
BANDFILTER FUNCTIONS OF EXPERIMENTAL STIMULI 
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